Database Reference
In-Depth Information
if __name__ == '__main__':
MRFemaleBirthCounter.run()
It's time to test our script using the slice of test data we created earlier. Multistep
MapReduce jobs introduce more complexity than single-step jobs, with more oppor-
tunity for something to go wrong. Fortunately, a cool feature of mrjob is the ability to
specify a particular step to run using the step-num f lag on the command line. This
is a useful way to perform a quick sanity check on a section of our pipeline. As shown
in Listing 8.12, we can specify that we want the script to send the output of the first
mapper step (zero-based) to stdout. Just as before, we can run the entire pipeline on
our test data.
Listing 8.12 Testing the mrjob_multistep_example.py script locally
# Test the output of the mapper from the first step
> python mrjob_multistep_example.py --mapper \
--step-num=0 < birth_data_big.txt
"F" "10-2010"
"F" "11-2010"
"F" "09-2010"
"F" "10-2010"
# etc...
# Test the output of the entire pipeline
> python mrjob_multistep_example.py < birth_data_sample.txt
"Female 01-2010" 4285
"Female 02-2010" 4002
"Female 03-2010" 4365
"Female 04-2010" 4144
Running mrjob Scripts on Elastic MapReduce
One of the core principles of this topic is that data processing solutions should avoid
managing hardware and infrastructure whenever it is practical and affordable. A great
feature of mrjob is the ability to easily run MapReduce jobs using Amazon's Elastic
MapReduce (EMR) service.
In order to take advantage of EMR integration, first create an Amazon Web Ser-
vices account and sign up for the Elastic MapReduce service. Once these are created,
you will also note (but don't share!) the Access Key ID and the corresponding Secret
Access Key (under the Security Credentials section of the AWS accounts page).
With these Access- and Secret-Key values, set the AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY environment variables on the machine on which your
mrjob script is hosted. It is also possible to add many of these values to an mrjobs.conf
 
 
Search WWH ::




Custom Search