Database Reference
In-Depth Information
def adjust_spec(params):
params['files'] = params['files'].split()
params['output_writer'] = {
'bucket_name': params['output_bucket'],
'naming_format': 'test/$id-$num'}
Using the validator lets you configure your MapReduces using the simple
configuration file and AppEngine console, rather than requiring you to write
custom code to configure and launch your MapReduce jobs.
Development Server
It is easiest to test AppEngine MapReduce code by actually uploading it
and using the live version rather than testing using the development
server dev_appserver.py . The reason for this is that
dev_appserver.py uses a fake GCS; therefore, you would first have
to create GCS files at the appropriate location before the example can
work correctly. Arguably, it would be nicer if dev_appserver.py had
the ability to use real GCS files, but at this time it does not support that
option.
This is a good time to try running a MapReduce job. Simply set the
output_bucket parameter (just the bucket name, no gs:// or /gs/ is
necessary) to the GCS bucket that you are using for the example, and click
Run. The status page will update and list your job as now running. You
can click the link to the job to follow its progress. When the job finishes
successfully, you can inspect its output by listing the contents of your
bucket. The sample provided writes to filenames of the form test/< job
id >-< shard > , For example:
$ gsutil ls gs://${GCS_BUCKET}/test/*
gs://bigquery-e2e/test/15784101297666AC77A71-0
gs://bigquery-e2e/test/15792505778554A7B9C41-0
This list command displays the output file generated by your job.
Search WWH ::




Custom Search