Big Data and the Cloud - Microsoft Big Data Solutions

Database Reference

In-Depth Information

NOTE

Hadoop Streaming functions by streaming data to the standard input

(stdin) and output (stdout). Executables that read and write using the

standard input/output can be used as either a mapper or reducer. For

more information on Hadoop Streaming, please visit:

Before beginning this tutorial, complete the following:

• Create a bucket in your Amazon S3 account, and within your bucket

create a folder named input . The bucket name must be unique because

the namespace is shared globally. (Amazon S3 and its features and

requirements are discussed in greater detail later in this chapter.)

• Locate the wordcount.txt and wordsplitter.py files provided in

the Chapter 13 materials ( http://www.wiley.com/go/

microsoftbigdatasolutions.com ) and upload the data file to the

input folder and the wordsplitter.py file in the root of your S3

bucket.

Now you are ready to stand up your cluster using the following steps:

1. In a web browser, navigate to the Amazon Web Services portal at

https://console.aws.amazon.com/ . Find and click the Elastic

MapReduce link under the Compute & Networking section (see Figure

13.1 ).

Search WWH ::

Custom Search

Home