DynamoDB with Redshift, Data Pipeline, and MapReduce - DynamoDB Applied Design Patterns

Database Reference

In-Depth Information

When the cluster status is WAITING , it means the master node is ready for connection.

By connecting SSH to the master node, you can perform a CLI operation on the master

node.

Step 4 - Setting up the Hive table to run Hive interactive commands

Hive is a data warehousing application, and you can leverage it to use query data in the

Amazon EMR clusters using the HiveQL language. To run Hive commands, follow the

given steps:

1. On the Hadoop prompt at the master node, type hive .

2. You will see a Hive prompt as follows:

hive>

3. Use the Hive command, which will map a table in the Hive application to the data

of DynamoDB. That table will act as a reference entity for the data stored in Dy-

namoDB. The data won't be stored locally on Hive. Check the following com-

mand for your reference:

CREATE EXTERNAL TABLE hiveuchit (col1 string, col2

bigint, col3 array<string>)

STORED BY

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'

TBLPROPERTIES ("dynamodb.table.name" =

"dynamodbuchit",

"dynamodb.column.mapping" =

"col1:name,col2:year,col3:holidays");

While you are running Hive queries on the DynamoDB table, you have to ensure that you

have provided enough read capacity units.

Now you have successfully completed the set up and configuration of EMR with Dy-

namoDB. We will now see some advanced Hive commands to perform operations such as

exporting, importing, querying, and joining.

Search WWH ::

Custom Search

Home