External Data Processing - Google BigQuery Analytics

Database Reference

In-Depth Information

return None

print '%s size: %d' % (uri, file_size)

if self.download_dir is not None:

self.download_file(gcs_object)

return file_size

To use the GcsReader , you can write code such as the following:

$ python

>>> gcs_bucket='bigquery-e2e'

>>> from gcs_reader import GcsReader

>>> gcsReader(gcs_bucket=gcs_bucket ,

… download_dir='/tmp/

bigquery').read('shakespeare.json')

gs://bigquery-e2e/shakespeare.json size: 13019156

Downloading:

gs://bigquery-e2e/shakespeare.json to /tmp/bigquery/

shakespeare.json

13019156

This will read the GCS file gs://bigquery-e2e/shakespeare.json

and download it to the /tmp/bigquery directory on your local machine.

The data will be downloaded in 1 MB chunks using the HTTP resumable

download protocol. The details of the code aren't important, but the

subsequent examples in this section use the GcsReader to download files

after running BigQuery extract jobs, so understanding what it does at a high

level is useful.

This example uses the same authentication code from Chapter 5, but has a

couple of additional tweaks. It adds a method to the auth module to get

the GCS service object, similar to the method that creates an authenticated

BigQuery service object. The other change to the auth module is to add

the GCS OAuth2 scope to the list of required scopes. Unfortunately, this

requires you to re-authenticate. In order to prevent you from getting errors

when you run these examples, the saved credential file is named differently

from the one used in other chapters. If you run these examples and get

a HTTP 403 error “Insufficient Permissions,” try deleting the saved

credentials

( ˜/bigquery_credentials.dat )

and

rerunning

the

operation.

Search WWH ::

Custom Search

Home