Integrating DynamoDB with Other AWS Components - Mastering DynamoDB

Database Reference

In-Depth Information

SELECT department, count(*) FROM packtPubEmployee GROUP BY

department;

Joining two DynamoDB tables

Sometimes we would also want to get the aggregate data from two DynamoDB tables; in

that case, you create two Hive tables, which would be mapped to the corresponding tables

in DynamoDB. Then you use the JOIN keyword and the key on which the tables should

be joined to get the joined dataset.

CREATE EXTERNAL TABLE packtPubEmployee (empid String, yoj

String, department String, salary bigint, manager String)

STORED BY

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'

TBLPROPERTIES ("dynamodb.table.name" = "Employee",

"dynamodb.column.mapping" =

"empid:empId,yoj:yoj,department:dept,salary:salary,manager:manager");

CREATE EXTERNAL TABLE packtPubDepartment (department

String, name String, head String)

STORED BY

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'

TBLPROPERTIES ("dynamodb.table.name" = "Department",

"dynamodb.column.mapping" =

"department:dept,name:name,head:head");

SELECT e.empid , d.name FROM

packtPubEmployee e JOIN packtPubDepartment d

ON (e.department = d.department);

This would result in a dataset that would have two columns: the first would be employee

ID and second would be the department it belongs to. You can add more filtering using

where conditions in case you need more specific information.

Joining tables from DynamoDB and S3

We also join tables from different sources, for example, one table from DynamoDB and

another from AWS S3 stored as flat file, or a table from DynamoDB and a table created

out of data present on HDFS. For such cases, you just need to create a table on Hive

which would map to the correct location and correct service and then you can simply run

Search WWH ::

Custom Search

Home