Database Reference
In-Depth Information
Compressed data export
Most of the time, we do data export to archive old data on S3. This kind of data is not fre-
quently used but needs to be kept somewhere so that in case it is needed, it can be taken
out. Hadoop/Hive supports various data compression algorithms. So, you can decide
which algorithm to use to store data in compressed manner. The following example
demonstrates how to export DynamoDB data to AWS S3 in compressed flat files.
To do so, you just need to set certain property values in the Hive console before you run
the export job. Here is a sample Hive script that exports data from the DynamoDB table
called Employee to S3 bucket in compressed files:
SET hive.exec.compress.output=true; # Sets the compression
mode ON.
SET io.seqfile.compression.type=BLOCK; # Sets the type of
compression
SET mapred.output.compression.codec =
org.apache.hadoop.io.compress.GzipCodec;
# Sets the algorithm to be used for compression.
CREATE EXTERNAL TABLE packtPubEmployee (empid String, yoj
String, department String, salary bigint, ,manager String)
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Employee",
"dynamodb.column.mapping" =
"empid:empId,yoj:yoj,department:dept,salary:salary,manager:manager");
CREATE EXTERNAL TABLE packtPubEmployee_tab_formatted(a_col
string, b_col bigint, c_col array<string>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://packt-pub-employee/employee/';
INSERT OVERWRITE TABLE packtPubEmployee_tab_formatted
SELECT *
FROM packtPubEmployee;
Search WWH ::




Custom Search