Loading Data - Google BigQuery Analytics

Database Reference

In-Depth Information

the load job creation before moving the bytes to the service and moving

the bytes in the same HTTP request that creates the load job. There are a

couple of advantages to passing the job in the initial HTTP request and then

transmitting the data over multiple requests.

• Some types of request errors can be caught early, so that you are

notified before actually transferring all the bytes. Currently the

validation is limited, but over time it is likely to become more

comprehensive.

• When you do not require a copy of the data in GCS, a direct upload to

BigQuery avoids explicit management of the files in GCS and associated

charges.

• The transfer is accomplished via a protocol that allows interrupted

uploads to be resumed rather than restarted.

Defining the load job before moving the bytes to BigQuery sounds like it

requires time travel; fortunately nothing so advanced is involved. This mode

of operation is achieved by having the job insertion HTTP request return

a URL that you use to upload the data to be loaded. BigQuery does not

start processing the request until you start pushing the data into the service.

Error conditions that can be detected by just inspecting the job creation

request are reported before the upload location is returned. Of course, the

operation can still fail after you start uploading the data if there is an error

parsing the bytes supplied, but this should not be surprising.

The details of the protocol are neatly wrapped up when you use the

Google-supplied API client libraries. The following code sample shows how

it is done in the Python client library.

from apiclient.http import MediaFileUpload

upload = MediaFileUpload('sample.csv',

mimetype='application/

octet-stream',

# This enables resumable

uploads.

resumable=True)

result = jobs.insert(projectId=PROJECT_ID,

body=body,

media_body=upload).execute()

Search WWH ::

Custom Search

Home