Database Reference
In-Depth Information
Performance considerations while using EMR with
DynamoDB
When we use EMR for data import/export, it consumes the provisioned throughput of the
tables. It is like just another DynamoDB client. It is easy to measure the time required to
import or export a certain amount of data from/to DynamoDB if you know the data size,
and the read and write capacity units provisioned to the table. For example, the Employee
table from DynamoDB needs to be exported to S3.
The size of the table is 10 GB and the read capacity provisioned in 50 units. The time re-
quired for the export, calculated as the time taken by EMR for 10 GB data export, will be
1073741824 bytes (10 GB) / 204800 bytes (50 Read Units) = 14.56 hours.
To reduce the time, we need to increase the read capacity units provisioned to the table.
The following are some important things one should keep in mind in order to get the best
out of DynamoDB and EMR integration.
• By default, EMR decides on its own the rate at which data should be fetched. It de-
cides the read rate depending upon the capacity units provisioned for the table. But
in case you are getting a high number of provisioned throughput exceeded inform-
ation, you can set the read rate by setting a parameter dy-
namodb.throughput.read.percent . The range for this parameter is from
0.1 to 1.5. The default read rate is 0.5, which means that Hive will consume half of
the capacity units provisioned to the table. You can increase the value if you need
to increase the performance, but make sure that you keep watching the table's con-
sumed capacity units and throttled request metrics and adjust the provisioned
throughput.
The parameter can be set on Hive console before you execute the query as follows:
SET dynamodb.throughput.read.percent = 1.0
• Similar to that, we also set the write rate if we want to control the import to Dy-
namoDB table. Here, we need the parameter dy-
namodb.throughput.write.percent the range of which varies from 0.1
to 1.5. Also, we need to set this before we run the query in the Hive console. Have
a look at the following commands:
SET dynamodb.throughput.write.percent = 1.2
Search WWH ::




Custom Search