Database Reference
In-Depth Information
Google Cloud Storage
A useful way to think about the role of Google Cloud Storage is to compare
it to the file system on your personal machine. Effectively, GCS is the file
system for the Google Cloud Platform. Every component of the platform,
including BigQuery, supports reading files stored in GCS. Unless you use
GCS as part of your application platform, your data will not be hosted in the
service. However, there are a couple of compelling reasons to use GCS for
transferring data to BigQuery.
• Robust tools and APIs for uploading data
• Simple BigQuery integration
• Cost effective data archival and backup solution
The drawback is that you have to pay for storing the data in GCS until you
load it into BigQuery, which can be wasteful if you already store your data in
a different location.
With GCS you have already completed the heavy lifting of moving the bytes
representing your data into the Google Cloud Platform even before initiating
the API call to BigQuery. This is accomplished via the GCS API
( https://developers.google.com/storage/docs/overview ) or
more simply using one of the client tools ( gsutil , browser application).
GCS objects are arranged according to a two-level naming scheme: a
top-level bucket name and an object name. Bucket names are globally
unique in the service, and object names are unique within a bucket. When
using the gsutil command-line tool to access a file stored in GCS, you will
use a URI of the form:
gs://< bucket >/< object >
BigQuery expects URIs in the same format when referencing GCS files in a
load job. Here is the code snippet to configure GCS locations in a load job:
loadConfig['sourceUris'] = [
'gs://bigquery-e2e/chapters/06/sample.csv',
'gs://bigquery-e2e/chapters/06/sample_*',
]
You can see that a single load job can specify multiple GCS URIs and that
URIs can be wildcards . Following the terminology commonly used in shells,
Search WWH ::




Custom Search