Database Reference
In-Depth Information
INTEGER((TIMESTAMP_TO_SEC(start) -
TIMESTAMP_TO_SEC(TIMESTAMP('2014-01-15'))) /
60) AS start_min,
INTEGER((TIMESTAMP_TO_SEC(end) -
TIMESTAMP_TO_SEC(TIMESTAMP('2014-01-15'))) /
60) AS end_min
FROM [bigquery-e2e:ch10.sessions]
WHERE start > '2014-01-14 23:00:00'
AND end < '2014-01-16 01:00:00') s
CROSS JOIN [bigquery-e2e:ch10.minutes] m
WHERE s.start_min <= m.index AND m.index <= s.end_min
GROUP BY minute)
GROUP BY hour
ORDER BY hour
To make this work you need to load a file containing the minutes. Actually
all you need is a table with a single column containing the integers [0,
24 * 60]. We have generated the table bigquery-e2e:ch10 . minutes
by simply loading a text file containing these integers. If your input is
spread across tables covering different time periods, then you need to adjust
the query for sessions that span the boundaries of your tables. Since this
example only uses a single table, all you had to do was ensure that the inner
WHERE clause selected sessions from an interval slightly larger than the day
you are analyzing.
Abstracting a bit from this specific problem, there are two important points
to note. Firstly, JOIN , and CROSS JOIN in particular, allow you to expand
the number of input rows, effectively multiplying the left table with the
right table. Second, although the ON clause in BigQuery joins is restricted to
equality comparisons, you can always use the WHERE clause to apply more
general filtering conditions to the generated rows. The only danger to avoid
is both tables being large in which case the cross product is prohibitive.
Summary
This was a long chapter, heavy on examples and details, yet it covered only
a small part of what can be done with BigQuery. For programmers familiar
Search WWH ::




Custom Search