Database Reference
In-Depth Information
Figure 13.3 Amazon EMR job config
a. Enter Sandbox Word Count for your job name.
b. Keep the default Hadoop distribution, Amazon Distribution.
c. Use the latest AMI version. As of this writing, 2.4.1 is the current
version.
d. Ensure that the Run Your Own Application option is selected.
e. Select Streaming as the processing job type.
4. Specify the parameters needed to execute your job flow. These include
input and output location, mapper and reducer jobs, and any additional
arguments required for your job, as shown in Figure 13.4 . Table 13.1 ,
elaborates on each option, including the values used for this demo. Note
that you must substitute your bucket name throughout this demo. Click
Continue when you have finished.
NOTE
For this demo, the built-in
org.apache.hadoop.mapreduce.lib.aggregate reducer is
used in place of a custom scripted reducer.
 
 
Search WWH ::




Custom Search