Database Reference
In-Depth Information
• In the event of failures, Amazon retries the Hive queries. The default retry
timeout is 2 minutes, but you can change it using parameter dy-
namodb.retry.duration . You need to mention the number of minutes after
which Amazon should retry the query in case of no response as follows:
SET dynamodb.retry.duration = 5;
This would set the retry interval to 5 minutes.
• You can also improve the import/export performance by increasing the number of
mappers. The number of mappers for a certain Hadoop cluster is dependent on the
capacity of the hardware the cluster is using. Either you can increase the hardware
configuration of the nodes or you can increase the number of nodes. Make a note
that to do this, you need to stop the EMR cluster and make changes into it. There
is one parameter, which you can also set to increase the number of mappers
mapred.tasktracker.map.tasks.maximum that you set, to increase the
performance. The only issue with increasing the value of this parameter is that it
may cause out-of-memory issues for the nodes present in the EMR cluster. As this
attribute is very specific to EMR-related operations, this cannot be simply set to
Hive console. For this, you need to set it as a bootstrap action. More information
about bootstrap actions is available at http://docs.aws.amazon.com/ElasticMapRe-
duce/latest/DeveloperGuide/emr-plan-bootstrap.html .
Search WWH ::




Custom Search