Database Reference
In-Depth Information
NOTE
Hadoop Streaming functions by streaming data to the standard input
(stdin) and output (stdout). Executables that read and write using the
standard input/output can be used as either a mapper or reducer. For
more information on Hadoop Streaming, please visit:
http://hadoop.apache.org/docs/r0.18.3/streaming.html .
Before beginning this tutorial, complete the following:
• Create a bucket in your Amazon S3 account, and within your bucket
create a folder named input . The bucket name must be unique because
the namespace is shared globally. (Amazon S3 and its features and
requirements are discussed in greater detail later in this chapter.)
• Locate the wordcount.txt and wordsplitter.py files provided in
the Chapter 13 materials ( http://www.wiley.com/go/
microsoftbigdatasolutions.com ) and upload the data file to the
input folder and the wordsplitter.py file in the root of your S3
bucket.
Now you are ready to stand up your cluster using the following steps:
1. In a web browser, navigate to the Amazon Web Services portal at
https://console.aws.amazon.com/ . Find and click the Elastic
MapReduce link under the Compute & Networking section (see Figure
13.1 ).
 
Search WWH ::




Custom Search