External Data Processing - Google BigQuery Analytics

Database Reference

In-Depth Information

'destinationFormat': 'NEWLINE_DELIMITED_JSON',

'destinationUris': destination_uris}

return {'extract': extract_config}

def run_extract_job(job_runner, gcs_reader,

source_project_id,

source_dataset_id, source_table_id):

'''Runs a BigQuery extract job and reads the

results.'''

timestamp = int(time.time())

gcs_object = 'output/%s.%s_%d.json' % (

source_dataset_id,

source_table_id,

timestamp)

destination_uri = gcs_reader.make_uri(gcs_object)

job_config = make_extract_config(

source_project_id,

source_dataset_id,

source_table_id,

[destination_uri])

if not job_runner.start_job(job_config):

return

print json.dumps(job_runner.get_job(), indent=2)

job_runner.wait_for_complete()

gcs_reader.read(gcs_object)

Because the details of running a job and downloading a file from GCS are

abstracted away from this listing, the code is short and straightforward.

The method run_extract_job picks a GCS path name based on the

timestamp and name of the table being downloaded, and then creates an

Extract job to write to that GCS path. Once the job completes, the listing

downloads the object using the GcsReader . The following snippet is an

example

usage

of

this

code

that

exports

the

table

publicdata:samples.shakespeare and downloads the results:

Search WWH ::

Custom Search

Home