External Data Processing - Google BigQuery Analytics

Database Reference

In-Depth Information

def adjust_spec(params):

params['files'] = params['files'].split()

params['output_writer'] = {

'bucket_name': params['output_bucket'],

'naming_format': 'test/$id-$num'}

Using the validator lets you configure your MapReduces using the simple

configuration file and AppEngine console, rather than requiring you to write

custom code to configure and launch your MapReduce jobs.

Development Server

It is easiest to test AppEngine MapReduce code by actually uploading it

and using the live version rather than testing using the development

server dev_appserver.py . The reason for this is that

dev_appserver.py uses a fake GCS; therefore, you would first have

to create GCS files at the appropriate location before the example can

work correctly. Arguably, it would be nicer if dev_appserver.py had

the ability to use real GCS files, but at this time it does not support that

option.

This is a good time to try running a MapReduce job. Simply set the

output_bucket parameter (just the bucket name, no gs:// or /gs/ is

necessary) to the GCS bucket that you are using for the example, and click

Run. The status page will update and list your job as now running. You

can click the link to the job to follow its progress. When the job finishes

successfully, you can inspect its output by listing the contents of your

bucket. The sample provided writes to filenames of the form test/< job

id >-< shard > , For example:

$ gsutil ls gs://${GCS_BUCKET}/test/*

gs://bigquery-e2e/test/15784101297666AC77A71-0

gs://bigquery-e2e/test/15792505778554A7B9C41-0

This list command displays the output file generated by your job.

Search WWH ::

Custom Search

Home