Database Reference
In-Depth Information
the load job creation before moving the bytes to the service and moving
the bytes in the same HTTP request that creates the load job. There are a
couple of advantages to passing the job in the initial HTTP request and then
transmitting the data over multiple requests.
• Some types of request errors can be caught early, so that you are
notified before actually transferring all the bytes. Currently the
validation is limited, but over time it is likely to become more
comprehensive.
• When you do not require a copy of the data in GCS, a direct upload to
BigQuery avoids explicit management of the files in GCS and associated
charges.
• The transfer is accomplished via a protocol that allows interrupted
uploads to be resumed rather than restarted.
Defining the load job before moving the bytes to BigQuery sounds like it
requires time travel; fortunately nothing so advanced is involved. This mode
of operation is achieved by having the job insertion HTTP request return
a URL that you use to upload the data to be loaded. BigQuery does not
start processing the request until you start pushing the data into the service.
Error conditions that can be detected by just inspecting the job creation
request are reported before the upload location is returned. Of course, the
operation can still fail after you start uploading the data if there is an error
parsing the bytes supplied, but this should not be surprising.
The details of the protocol are neatly wrapped up when you use the
Google-supplied API client libraries. The following code sample shows how
it is done in the Python client library.
from apiclient.http import MediaFileUpload
upload = MediaFileUpload('sample.csv',
mimetype='application/
octet-stream',
# This enables resumable
uploads.
resumable=True)
result = jobs.insert(projectId=PROJECT_ID,
body=body,
media_body=upload).execute()
Search WWH ::




Custom Search