External Data Processing - Google BigQuery Analytics

Database Reference

In-Depth Information

create files in your output GCS bucket. To grant access, substitute your own

AppEngine ID for APP_ID and your own GCS bucket for GCS_BUCKET and

issue the following commands:

$ APP_ID=bigquery-mr-sample

$ GCS_BUCKET=bigquery-e2e

$ gsutil acl ch \

- u ${APP_ID}@appspot.gserviceaccount.com:W \

gs://${GCS_BUCKET}

Updated ACL on gs://bigquery-e2e/

You can run a simple MapReduce with almost no additional code beyond

the previous script. You just need to set some configuration parameters

and save them in a file called mapreduce.yaml at the top level of the

AppEngine project directory. Here is an example mapreduce.yaml file:

mapreduce:

- name: Add Zip Codes

mapper:

handler: add_zip.apply

input_reader:

mapreduce.input_readers.FileInputReader

output_writer:

mapreduce.output_writers._GoogleCloudStorageOutputWriter

params_validator: validator.adjust_spec

params:

- name: files

value: /gs/bigquery-e2e/chapters/12/

add_zip_input.json

- name: shards

default: 1

- name: format

default: lines

- name: output_bucket

default: bigquery-e2e

Search WWH ::

Custom Search

Home