Integrating DynamoDB with Other AWS Components - Mastering DynamoDB

Database Reference

In-Depth Information

Compressed data export

Most of the time, we do data export to archive old data on S3. This kind of data is not fre-

quently used but needs to be kept somewhere so that in case it is needed, it can be taken

out. Hadoop/Hive supports various data compression algorithms. So, you can decide

which algorithm to use to store data in compressed manner. The following example

demonstrates how to export DynamoDB data to AWS S3 in compressed flat files.

To do so, you just need to set certain property values in the Hive console before you run

the export job. Here is a sample Hive script that exports data from the DynamoDB table

called Employee to S3 bucket in compressed files:

SET hive.exec.compress.output=true; # Sets the compression

mode ON.

SET io.seqfile.compression.type=BLOCK; # Sets the type of

compression

SET mapred.output.compression.codec =

org.apache.hadoop.io.compress.GzipCodec;

# Sets the algorithm to be used for compression.

CREATE EXTERNAL TABLE packtPubEmployee (empid String, yoj

String, department String, salary bigint, ,manager String)

STORED BY

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'

TBLPROPERTIES ("dynamodb.table.name" = "Employee",

"dynamodb.column.mapping" =

"empid:empId,yoj:yoj,department:dept,salary:salary,manager:manager");

CREATE EXTERNAL TABLE packtPubEmployee_tab_formatted(a_col

string, b_col bigint, c_col array<string>)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

LOCATION 's3://packt-pub-employee/employee/';

INSERT OVERWRITE TABLE packtPubEmployee_tab_formatted

SELECT *

FROM packtPubEmployee;

Search WWH ::

Custom Search

Home