Database Reference
In-Depth Information
This query generates three columns for the counts of the longest words in
King Lear, Tempest, and Macbeth. It also includes a total column that is the
total across all the works in the table. As you might guess, most of the counts
are zero because long words are unique and tend to be limited to a single
corpus.
In many cases the values to pivot are not known up front. Even in this query
we picked three works at random. In realistic reports there is usually a better
way to select the columns. For example, we could identify the three largest
works and use those in our column expressions.
SELECT CONCAT(
CONCAT("SUM(IF(corpus = '", corpus),
"', corpus_total, 0)")
FROM (
SELECT corpus, SUM(word_count) total
FROM [publicdata:samples.shakespeare]
GROUP BY corpus
ORDER BY total DESC
LIMIT 3)
If the number of columns grows large, the query can be tedious to write, but
usually such queries are generated programmatically. The previous query
generates the individual column expressions. In practice, you might not
actually want BigQuery to handle your formatting, but the query illustrates
how you might feed a query result into a string-formatting library to
generate the final query.
Cohort Analysis
When trying to make sense of Big Data, it is common to discuss cohorts that
are basically sets of entities that have a specific property, for example:
• Users who accessed a given application on their phone
• Wikipedia contributors that edited a particular title
• Cities that have more than one ZIP code
In most cases it is not possible to determine if an entity belongs to a
particular cohort by inspecting a single record. Here you investigate how to
Search WWH ::




Custom Search