Integrating DynamoDB with Other AWS Components - Mastering DynamoDB

Database Reference

In-Depth Information

Performance considerations while using EMR with

DynamoDB

When we use EMR for data import/export, it consumes the provisioned throughput of the

tables. It is like just another DynamoDB client. It is easy to measure the time required to

import or export a certain amount of data from/to DynamoDB if you know the data size,

and the read and write capacity units provisioned to the table. For example, the Employee

table from DynamoDB needs to be exported to S3.

The size of the table is 10 GB and the read capacity provisioned in 50 units. The time re-

quired for the export, calculated as the time taken by EMR for 10 GB data export, will be

1073741824 bytes (10 GB) / 204800 bytes (50 Read Units) = 14.56 hours.

To reduce the time, we need to increase the read capacity units provisioned to the table.

The following are some important things one should keep in mind in order to get the best

out of DynamoDB and EMR integration.

• By default, EMR decides on its own the rate at which data should be fetched. It de-

cides the read rate depending upon the capacity units provisioned for the table. But

in case you are getting a high number of provisioned throughput exceeded inform-

ation, you can set the read rate by setting a parameter dy-

namodb.throughput.read.percent . The range for this parameter is from

0.1 to 1.5. The default read rate is 0.5, which means that Hive will consume half of

the capacity units provisioned to the table. You can increase the value if you need

to increase the performance, but make sure that you keep watching the table's con-

sumed capacity units and throttled request metrics and adjust the provisioned

throughput.

The parameter can be set on Hive console before you execute the query as follows:

SET dynamodb.throughput.read.percent = 1.0

• Similar to that, we also set the write rate if we want to control the import to Dy-

namoDB table. Here, we need the parameter dy-

namodb.throughput.write.percent the range of which varies from 0.1

to 1.5. Also, we need to set this before we run the query in the Hive console. Have

a look at the following commands:

SET dynamodb.throughput.write.percent = 1.2

Search WWH ::

Custom Search

Home