Database Reference
In-Depth Information
INTEGER((TIMESTAMP_TO_SEC(start) -
TIMESTAMP_TO_SEC(TIMESTAMP('2014-01-15'))) /
60) AS start_min,
INTEGER((TIMESTAMP_TO_SEC(end) -
TIMESTAMP_TO_SEC(TIMESTAMP('2014-01-15'))) /
60) AS end_min
FROM [bigquery-e2e:ch10.sessions]
WHERE start > '2014-01-14 23:00:00'
AND end < '2014-01-16 01:00:00') s
CROSS JOIN [bigquery-e2e:ch10.minutes] m
WHERE s.start_min <= m.index AND m.index <= s.end_min
GROUP BY minute)
GROUP BY hour
ORDER BY hour
To make this work you need to load a file containing the minutes. Actually
all you need is a table with a single column containing the integers [0,
24 * 60]. We have generated the table
bigquery-e2e:ch10
.
minutes
by simply loading a text file containing these integers. If your input is
spread across tables covering different time periods, then you need to adjust
the query for sessions that span the boundaries of your tables. Since this
example only uses a single table, all you had to do was ensure that the inner
WHERE
clause selected sessions from an interval slightly larger than the day
you are analyzing.
Abstracting a bit from this specific problem, there are two important points
to note. Firstly,
JOIN
, and
CROSS JOIN
in particular, allow you to expand
the number of input rows, effectively
multiplying
the left table with the
right table. Second, although the
ON
clause in BigQuery joins is restricted to
equality comparisons, you can always use the
WHERE
clause to apply more
general filtering conditions to the generated rows. The only danger to avoid
is both tables being large in which case the cross product is prohibitive.
Summary
This was a long chapter, heavy on examples and details, yet it covered only
a small part of what can be done with BigQuery. For programmers familiar