Managing Data Stored in BigQuery - Google BigQuery Analytics

Database Reference

In-Depth Information

Intermediate Results

As you have seen, query results are tables, so they can be used as sources

for further queries. This feature can be useful in a variety of application

scenarios. For example, consider the problem of exploring the mobile

records collected according to the hour and state in which they were

collected. Potentially interesting statistics are:

• Counts by hour summed over all states

• Relative distribution by state in a particular time range

• Hourly distribution for a given state

Running any one of these queries on the base data each time a page is

rendered can be expensive. Because each result would require a unique

query, a simple cache would not be effective. But a single intermediate

table can make all these queries have a fixed, lower cost independent of

the underlying table size. You can construct this intermediate table with the

following query:

SELECT

DATE(l.ts) Day,

HOUR(l.ts) Hour,

z.state State,

COUNT(l.ts) Num

FROM [bigquery-e2e:ch11.sample_device_log] l

INNER JOIN [bigquery-e2e:reference.zip_codes] z

ON z.zip=l.location.zip

GROUP BY 1, 2, 3

ORDER BY 1, 2, 3;

The result table is not suitable for direct rendering because it has 7 × 24 ×

50 = 8400 rows but is small enough that the cost of a query on the table is

negligible. A further rollup query, one that performs additional aggregation,

on this result table can be used to generate a summary view. Filter queries

that select a subset of rows corresponding to specific states and time ranges

can be used to explore details. Also note that identical queries referencing

this table will benefit from the auto-caching feature because this result

table is effectively frozen until the query that generated it is run again. By

virtue of this feature, frequently accessed graphs will end up being cached

and returning immediately. It is also worth noting that the size of this

Search WWH ::

Custom Search

Home