Database Reference
In-Depth Information
characteristics, and in many cases it is useful to load your Datastore table
into BigQuery to perform analytics and reporting. The bridge between these
two services is based on the AppEngine facility to generate backups of
Datastore tables (Kinds, in Datastore parlance) stored in GCS. BigQuery can
import these backups from GCS and enable queries over the data. Because
the data in BigQuery corresponds to a snapshot of the Datastore table at the
time of the backup, the queries are not executing over the live data. This
limitation poses a problem for real-time use cases, but for many reporting
and analytics use cases, this is generally acceptable.
When you perform a Datastore backup to GCS, it produces a manifest file
describing the contents of the backup that is stored in GCS along with
the actual data files containing your data. Importing this into BigQuery
is simply a matter of specifying the right format and the location of the
manifest file in GCS:
load_config['sourceFormat'] = 'DATASTORE_BACKUP'
load_config['sourceUris'] = ['gs://<backup bucket>/
<backup manifest>']
Running this load job is simple enough and is much like a CSV or JSON
load from GCS. However, there is a lot going on behind the scenes. The most
important part is that the schema for the destination table is derived from
the entities in your backup. A good understanding of the Datastore data
model is required to follow the schema generation algorithm and appreciate
its limitations. This discussion is beyond the scope of this chapter on loading
data into BigQuery. Instead, check out the full details of this integration in
Chapter 11, which covers this topic and the relevant concepts in Datastore.
Errors
The beginning of the chapter mentioned that one of the challenges with
moving data is that often the producer and consumer have minor
disagreements about how the data should be represented. BigQuery is no
different and despite your best intentions, it is possible that the data you
ship to the service will be considered invalid. Common issues include:
• Incorrectly formatted fields, for example, an invalid timestamp
representation
• Mismatches between the declared schema and supplied data
Search WWH ::




Custom Search