Database Reference
In-Depth Information
return None
print '%s size: %d' % (uri, file_size)
if self.download_dir is not None:
self.download_file(gcs_object)
return file_size
To use the GcsReader , you can write code such as the following:
$ python
>>> gcs_bucket='bigquery-e2e'
>>> from gcs_reader import GcsReader
>>> gcsReader(gcs_bucket=gcs_bucket ,
download_dir='/tmp/
bigquery').read('shakespeare.json')
gs://bigquery-e2e/shakespeare.json size: 13019156
Downloading:
gs://bigquery-e2e/shakespeare.json to /tmp/bigquery/
shakespeare.json
13019156
This will read the GCS file gs://bigquery-e2e/shakespeare.json
and download it to the /tmp/bigquery directory on your local machine.
The data will be downloaded in 1 MB chunks using the HTTP resumable
download protocol. The details of the code aren't important, but the
subsequent examples in this section use the GcsReader to download files
after running BigQuery extract jobs, so understanding what it does at a high
level is useful.
This example uses the same authentication code from Chapter 5, but has a
couple of additional tweaks. It adds a method to the auth module to get
the GCS service object, similar to the method that creates an authenticated
BigQuery service object. The other change to the auth module is to add
the GCS OAuth2 scope to the list of required scopes. Unfortunately, this
requires you to re-authenticate. In order to prevent you from getting errors
when you run these examples, the saved credential file is named differently
from the one used in other chapters. If you run these examples and get
a HTTP 403 error “Insufficient Permissions,” try deleting the saved
credentials
( ˜/bigquery_credentials.dat )
and
rerunning
the
operation.
Search WWH ::




Custom Search