Database Reference
In-Depth Information
u'kind': u'bigquery#queryResponse',
u'rows': [{u'f': [{u'v': u'164656'}, {u'v':
u'1.394988017942355E9'}]}],
u'schema': {u'fields': [{u'mode': u'NULLABLE',
u'name': u'f0_',
u'type': u'INTEGER'},
{u'mode': u'NULLABLE',
u'name': u'f1_',
u'type': u'FLOAT'}]},
u'totalBytesProcessed': u'0',
u'totalRows': u'1'}
Note that the number of bytes processed is 1332943 in the first query, but
goes down to 0 in the second query. The cacheHit flag also goes from
False to True .
The operation of the cache relies on the anonymous tables described in a
previous section. Because BigQuery is free to give these tables any names
that it wants, it generates a deterministic name from the query. To create
this name, it takes a parsed version of the query that you're running and the
last modified times of all of the tables involved in the query, and computes
a cryptographic hash. It then uses this cryptographic hash as the name of
the table. The next time the query is run, BigQuery checks for the existence
of a table with this name, and if it already exists, just uses that in the
response. You can see this in action by checking the destination table names
of back-to-back query jobs that run the same query—they will be the same
table.
Sometimes, you might want to run a query only if it is already cached. Maybe
you want to avoid running up any more charges, or maybe the query takes a
long time and you don't want to wait for the query to execute. If you set the
createDisposition in the query configuration to CREATE_NEVER , this
tells BigQuery that if the cached table doesn't exist, don't create it.
Returning Large Results
By default, BigQuery query responses are limited to 128 MB. There are some
architectural reasons for this limitation, which are discussed in Chapter
9, “Understanding Query Execution,” but there is also a common-sense
justification as well: When you're dealing with Big Data, it is easy to generate
Search WWH ::




Custom Search