Advanced Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

This query generates three columns for the counts of the longest words in

King Lear, Tempest, and Macbeth. It also includes a total column that is the

total across all the works in the table. As you might guess, most of the counts

are zero because long words are unique and tend to be limited to a single

corpus.

In many cases the values to pivot are not known up front. Even in this query

we picked three works at random. In realistic reports there is usually a better

way to select the columns. For example, we could identify the three largest

works and use those in our column expressions.

SELECT CONCAT(

CONCAT("SUM(IF(corpus = '", corpus),

"', corpus_total, 0)")

FROM (

SELECT corpus, SUM(word_count) total

FROM [publicdata:samples.shakespeare]

GROUP BY corpus

ORDER BY total DESC

LIMIT 3)

If the number of columns grows large, the query can be tedious to write, but

usually such queries are generated programmatically. The previous query

generates the individual column expressions. In practice, you might not

actually want BigQuery to handle your formatting, but the query illustrates

how you might feed a query result into a string-formatting library to

generate the final query.

Cohort Analysis

When trying to make sense of Big Data, it is common to discuss cohorts that

are basically sets of entities that have a specific property, for example:

• Users who accessed a given application on their phone

• Wikipedia contributors that edited a particular title

• Cities that have more than one ZIP code

In most cases it is not possible to determine if an entity belongs to a

particular cohort by inspecting a single record. Here you investigate how to

Search WWH ::

Custom Search

Home