Database Reference
In-Depth Information
Export data to EMR - HDFS
We can also export the DynamoDB table data to certain folder on Hadoop Distributed
File System ( HDFS ). There might be a use case where you would need to process data on
Hadoop using a MapReduce job instead of directly running a Hive query. In this case, we
would first need to get the data in some HDFS folder and then run the MapReduce job on
it. The following code script represents the importing data from DynamoDB table to
HDFS:
CREATE EXTERNAL TABLE packtPubEmployee (empid String, yoj
String, department String, salary bigint, ,manager String)
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Employee",
"dynamodb.column.mapping" =
"empid:empId,yoj:yoj,department:dept,salary:salary,manager:manager");
SET dynamodb.throughput.read.percent=1.0;
INSERT OVERWRITE DIRECTORY 'hdfs:///data/employee/' SELECT *
FROM packtPubEmployee;
Here, by setting the dynamodb.throughput.read.percent variable, we are con-
trolling the read request rate from DynamoDB; you can play around with this variable
value and tune it to make it suitable for your performance expectations. In the insert query,
you need to specify the directory on HDFS where you wish to export the data. This would
also allow us to export data from production DynamoDB table without risking a perform-
ance degrade.
Search WWH ::




Custom Search