Database Reference
In-Depth Information
AVG(edits) AS avg_edits
FROM (
SELECT
contributor_id,
MIN(IF(bush_edit, ts, 99999999999)) AS first_bush ,
SOME(bush_edit) AS bush_some,
MIN(IF(obama_edit, ts, 99999999999)) AS
first_obama ,
SOME(obama_edit) AS obama_some,
COUNT(1) AS edits
FROM (
SELECT
contributor_id,
timestamp AS ts ,
(LOWER(title) = 'george w. bush') AS bush_edit,
(LOWER(title) = 'barrack obama') AS obama_edit
FROM [publicdata:samples.wikipedia])
WHERE bush_edit OR obama_edit
GROUP EACH BY 1)
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
The main feature to observe is that you can compute cohort membership in
the top-level query. In this case, you can subdivide the cohort of contributors
that have edited the pages of Obama and Bush into separate cohorts
depending on which page was edited earlier. You can also see that we
tweaked the rows being considered for the total statistics by moving the
condition from the HAVING clause to the WHERE clause, limiting the query to
rows involving the pages of interest, which is a condition we can apply prior
to the aggregation.
The essential piece of this technique is using a nested GROUP EACH
specifying the fields that identify the entity that is being evaluated for
inclusion in a cohort. After you have a handle on this bit, you can play with
the row filter conditions and aggregations at different levels of the query to
extract the information you need.
Search WWH ::




Custom Search