Database Reference
In-Depth Information
Export data to AWS S3
AWS S3 is a cheaper way to store or dump your data. Amazon allows us to export data
from DynamoDB quite easily. We can export data to S3 in various forms, such as simple
data (as it is), formatted data, or compressed data. We can perform them by using simple
data export.
Consider that you have a table called Employee that contains data about employee de-
tails. A schema for the table would be something like this: Employee ( empId:String ,
yoj:String , dept:String , salary:Number , manager:String ).
Suppose we decide to export the data from this table to a bucket called packt-pub-em-
ployee in the folder /employee_data , then, you can write a Hive query to first create
a hive table as shown by the following commands:
CREATE EXTERNAL TABLE packtPubEmployee (empid String, yoj
String, department String, salary bigint, ,manager String)
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Employee",
"dynamodb.column.mapping" =
"empid:empId,yoj:yoj,department:dept,salary:salary,manager:manager");
Here, we are creating an external Hive table called packtPubEmployee with the same
schema as the DynamoDB table. By providing TBLPROPERTIES for this table, we are in-
dicating which table from DynamoDB is to be mapped to this table in Hive and what
columns are to be mapped from DynamoDB table to hive table.
Once you run this on Hive by connecting to the EMR cluster, the table definition would get
created; the actual data exporting would happen once you run the following HiveQL state-
ment, which will run the insert data statement:
INSERT OVERWRITE DIRECTORY 's3://packt-pub-employee/
employee/' SELECT *
FROM packtPubEmployee;
Here, you can replace your own bucket path instead of mine, and the same is the case with
the DynamoDB table name. Once you run this statement, EMR will launch a MapReduce
job, which would take its own time depending upon the data it needs to process. Once
Search WWH ::




Custom Search