Database Reference
In-Depth Information
SELECT department, count(*) FROM packtPubEmployee GROUP BY
department;
Joining two DynamoDB tables
Sometimes we would also want to get the aggregate data from two DynamoDB tables; in
that case, you create two Hive tables, which would be mapped to the corresponding tables
in DynamoDB. Then you use the JOIN keyword and the key on which the tables should
be joined to get the joined dataset.
CREATE EXTERNAL TABLE packtPubEmployee (empid String, yoj
String, department String, salary bigint, manager String)
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Employee",
"dynamodb.column.mapping" =
"empid:empId,yoj:yoj,department:dept,salary:salary,manager:manager");
CREATE EXTERNAL TABLE packtPubDepartment (department
String, name String, head String)
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Department",
"dynamodb.column.mapping" =
"department:dept,name:name,head:head");
SELECT e.empid , d.name FROM
packtPubEmployee e JOIN packtPubDepartment d
ON (e.department = d.department);
This would result in a dataset that would have two columns: the first would be employee
ID and second would be the department it belongs to. You can add more filtering using
where conditions in case you need more specific information.
Joining tables from DynamoDB and S3
We also join tables from different sources, for example, one table from DynamoDB and
another from AWS S3 stored as flat file, or a table from DynamoDB and a table created
out of data present on HDFS. For such cases, you just need to create a table on Hive
which would map to the correct location and correct service and then you can simply run
Search WWH ::




Custom Search