Database Reference
In-Depth Information
compute a condition that spans multiple rows and use it in a cohort analysis
query.
TIP
In this chapter we have mostly used explicit field references in
GROUP
BY
and
ORDER BY
clauses. This generally makes the query a little easier
to understand. The older SQL standard and many popular
implementations also support using column indices in these clauses.
BigQuery also supports this feature, and we have used it in the queries
that appear in this recipe and also in other chapters. Often it is useful
when you are experimenting with queries, and in some cases it even
improves the readability of the query. However, the feature is not
standardized, so if that is a concern, you may want to avoid it in your
production code.
In the Wikipedia example, because each record contains a single edit, you
must look at all the edits for a given contributor to determine if the
contributor belongs to the cohort. You can evaluate each record to see if it
establishes membership or not, but then you need to combine all the results
to see if there is
at least
one record where the condition evaluates to true.
The
SOME
and
EVERY
aggregate functions perform the logical aggregation
required for this operation.
SOME
computes the disjunction of its inputs; it
is true only if there is at least one input that is true.
EVERY
computes the
conjunction of its inputs; it is false if at least one input is false.
SELECT bush_all, bush_some, obama_all, obama_some,
COUNT(1) AS num, AVG(edits) AS avg_edits
FROM (
SELECT contributor_id,
EVERY(bush_edit) AS bush_all, SOME(bush_edit) AS
bush_some,
EVERY(obama_edit) AS obama_all, SOME(obama_edit)
AS obama_some,
COUNT(1) AS edits
FROM (
SELECT