Integrating DynamoDB with Other AWS Components - Mastering DynamoDB

Database Reference

In-Depth Information

Export data to EMR - HDFS

We can also export the DynamoDB table data to certain folder on Hadoop Distributed

File System ( HDFS ). There might be a use case where you would need to process data on

Hadoop using a MapReduce job instead of directly running a Hive query. In this case, we

would first need to get the data in some HDFS folder and then run the MapReduce job on

it. The following code script represents the importing data from DynamoDB table to

HDFS:

CREATE EXTERNAL TABLE packtPubEmployee (empid String, yoj

String, department String, salary bigint, ,manager String)

STORED BY

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'

TBLPROPERTIES ("dynamodb.table.name" = "Employee",

"dynamodb.column.mapping" =

"empid:empId,yoj:yoj,department:dept,salary:salary,manager:manager");

SET dynamodb.throughput.read.percent=1.0;

INSERT OVERWRITE DIRECTORY 'hdfs:///data/employee/' SELECT *

FROM packtPubEmployee;

Here, by setting the dynamodb.throughput.read.percent variable, we are con-

trolling the read request rate from DynamoDB; you can play around with this variable

value and tune it to make it suitable for your performance expectations. In the insert query,

you need to specify the directory on HDFS where you wish to export the data. This would

also allow us to export data from production DynamoDB table without risking a perform-

ance degrade.

Search WWH ::

Custom Search

Home