Integrating DynamoDB with Other AWS Components - Mastering DynamoDB

Database Reference

In-Depth Information

Export data to AWS S3

AWS S3 is a cheaper way to store or dump your data. Amazon allows us to export data

from DynamoDB quite easily. We can export data to S3 in various forms, such as simple

data (as it is), formatted data, or compressed data. We can perform them by using simple

data export.

Consider that you have a table called Employee that contains data about employee de-

tails. A schema for the table would be something like this: Employee ( empId:String ,

yoj:String , dept:String , salary:Number , manager:String ).

Suppose we decide to export the data from this table to a bucket called packt-pub-em-

ployee in the folder /employee_data , then, you can write a Hive query to first create

a hive table as shown by the following commands:

CREATE EXTERNAL TABLE packtPubEmployee (empid String, yoj

String, department String, salary bigint, ,manager String)

STORED BY

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'

TBLPROPERTIES ("dynamodb.table.name" = "Employee",

"dynamodb.column.mapping" =

"empid:empId,yoj:yoj,department:dept,salary:salary,manager:manager");

Here, we are creating an external Hive table called packtPubEmployee with the same

schema as the DynamoDB table. By providing TBLPROPERTIES for this table, we are in-

dicating which table from DynamoDB is to be mapped to this table in Hive and what

columns are to be mapped from DynamoDB table to hive table.

Once you run this on Hive by connecting to the EMR cluster, the table definition would get

created; the actual data exporting would happen once you run the following HiveQL state-

ment, which will run the insert data statement:

INSERT OVERWRITE DIRECTORY 's3://packt-pub-employee/

employee/' SELECT *

FROM packtPubEmployee;

Here, you can replace your own bucket path instead of mine, and the same is the case with

the DynamoDB table name. Once you run this statement, EMR will launch a MapReduce

job, which would take its own time depending upon the data it needs to process. Once

Search WWH ::

Custom Search

Home