Database Reference
In-Depth Information
Integrating with AWS EMR
Hadoop and Big Data is one of the most used extract, transform, and load (ETL) tools these
days. Most of the companies are using it to fetch more and more information from the data
available with them. But sometimes it is found that creating and maintaining the Hadoop
cluster is quite a time-consuming job, especially when you don't have much exposure to the
Linux/Unix environment. Also, if you need to use Hadoop in production, you would need
to hire a specialist Hadoop admin, which is an overhead in terms of cost. To solve this,
AWS has introduced a hosted Hadoop as a service where you just need to provide your re-
quirement in terms of cluster configuration (number of data nodes and the size of instances
based on the size of data you want to process), additional services such as Hive, Pig, and so
on, if required, and once done, on a single click of the button, you have your Hadoop
cluster ready.
You can find more details about how to launch Elastic MapReduce EMR cluster and how
to play with it at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/
emr-what-is-emr.html .
In this section, we will cover the following topics:
• Exporting data from DynamoDB
• Querying and joining tables in DynamoDB using AWS EMR
• Importing data to DynamoDB
Search WWH ::




Custom Search