Loading Data - Google BigQuery Analytics

Database Reference

In-Depth Information

could not be created (in which case the work done to transfer the bytes is

lost); otherwise, it will contain the job created. Notice that in the example

we have base64-encoded the data in the request body and added the

Content-Transfer-Encoding header. This is not strictly necessary but

it can help mitigate issues with problematic HTTP proxies.

When using the client library, you select this mode of operation by simply

leaving out resumable=True or setting it explicitly to False . However,

there is no good reason to use this mode when working through the client

libraries. The resumable flag defaults to False to maintain backward

compatibility with earlier versions of the client library and APIs that do

not support the Resumable Upload protocol. If for some reason you cannot

use the client library in your application, it is reasonable to implement

this multipart method rather than implement the full Resumable Upload

protocol, which requires more complicated code. Just be aware that you may

encounter issues with failed HTTP requests trying to upload large amounts

of data with this approach. The problem is that the likelihood of a random

failure affecting a request increases with the size of a request. So a very large

request can fail frequently due to intermittent network failures.

That covers the different options you have for moving your data into

BigQuery. If you primarily work with installed software, it may seem odd to

move the data rather than move the software. On the other hand, if you are

familiar with cloud-based services, the process of moving your data into the

cloud will feel natural. This section has covered three separate methods for

transferring data. Google Cloud Storage is ideal if you would like to retain a

backup copy of your data outside of BigQuery. Once the data is uploaded to

GCS, importing it into BigQuery is simply a matter of referencing the files.

If you only need the data to be stored in BigQuery, the Resumable Upload

protocol is the best choice because it allows for large amounts of data to

be transferred robustly. Finally, if simplicity or minimizing the number of

HTTP requests is the most important consideration, you can use multipart

requests, but be aware that this method may not scale well to large data

sizes.

Destination Table

Now you need to control where the data you are loading ends up inside

BigQuery. The load job configuration specifies a single destination table and

optionally includes a schema for the table. The destination table can live in

Search WWH ::

Custom Search

Home