Database Reference
In-Depth Information
This file is basically a replacement for the last part of our script that read
from the standard input and wrote to the standard output in Listing 12.7 .
You just need to tell the MapReduce where to get the source data and
where to write the results. The input_reader and output_writer fields
indicate that you want to read and write data in GCS. The params field
contains parameters that control the behavior of these modules, in this
case, by specifying the input and output location as well as the degree of
parallelism. Note that files in GCS are referenced using a slightly different
syntax: /gs/bucket/object instead of the gs://bucket/object
format used by BigQuery.
The default output bucket is set to bigquery-e2e . You will not have write
access to this bucket, so you can either override the value to your own GCS
bucket here or you can wait until you start the MapReduce and set the value
in the MapReduce settings page.
If you've modified any of the files in the AppEngine app, you'll need to
re-upload the most recent version. You can do this by re-running the
previous appcfg.py command:
$ appcfg.py update controller.yaml
After you have uploaded your application with the MapReduce SDK, you can
navigate to http://< your-app >. appspot.com/mapreduce .
You can see a console that lists MapReduce jobs that have been created,
as well as a form for creating new jobs according to the templates you
defined. In this case you see a single option representing your MapReduce
configuration, with form fields that allow you to edit the parameters.
Unfortunately, the automatic forms support only simple string parameters,
but some of the input modules expect a list or dictionary of values. To
turn the form parameters into a configuration dictionary that you need to
pass into the I/O modules, you can provide a Python function to do the
translation. The params_validator setting in the configuration provides
the name of a parameter transformation and validation function. It is passed
a dictionary of all the values in the form as its single argument; it can
modify that dictionary to turn it into a valid MapReduce configuration. If it
throws an exception, the MapReduce creation simply fails. We have defined
a simple version for this pair of writer and reader in validator.py :
Search WWH ::




Custom Search