Advanced Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

AVG(edits) AS avg_edits

FROM (

SELECT

contributor_id,

MIN(IF(bush_edit, ts, 99999999999)) AS first_bush ,

SOME(bush_edit) AS bush_some,

MIN(IF(obama_edit, ts, 99999999999)) AS

first_obama ,

SOME(obama_edit) AS obama_some,

COUNT(1) AS edits

FROM (

SELECT

contributor_id,

timestamp AS ts ,

(LOWER(title) = 'george w. bush') AS bush_edit,

(LOWER(title) = 'barrack obama') AS obama_edit

FROM [publicdata:samples.wikipedia])

WHERE bush_edit OR obama_edit

GROUP EACH BY 1)

GROUP BY 1, 2, 3

ORDER BY 1, 2, 3

The main feature to observe is that you can compute cohort membership in

the top-level query. In this case, you can subdivide the cohort of contributors

that have edited the pages of Obama and Bush into separate cohorts

depending on which page was edited earlier. You can also see that we

tweaked the rows being considered for the total statistics by moving the

condition from the HAVING clause to the WHERE clause, limiting the query to

rows involving the pages of interest, which is a condition we can apply prior

to the aggregation.

The essential piece of this technique is using a nested GROUP EACH

specifying the fields that identify the entity that is being evaluated for

inclusion in a cohort. After you have a handle on this bit, you can play with

the row filter conditions and aggregations at different levels of the query to

extract the information you need.

Search WWH ::

Custom Search

Home