Running Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

This query returns the most commonly used words (lowercased) longer

than five letters, the count, and the corpus in which they appear, in any

of Shakespeare's plays with “king” in the title. It doesn't, however, count

frequency across plays; it just returns the count of the word per-play. Here

are the results:

+------------+-----------+----------------+

| word | frequency | corpus |

+------------+-----------+----------------+

| falstaff | 199 | 2kinghenryiv |

| prince | 192 | 1kinghenryiv |

| richard | 188 | kingrichardiii |

| gloucester | 182 | kingrichardiii |

| edward | 181 | 3kinghenryvi |

| falstaff | 168 | 1kinghenryiv |

| gloucester | 141 | kinglear |

| richard | 134 | kingrichardii |

| warwick | 122 | 3kinghenryvi |

| cardinal | 121 | kinghenryviii |

+------------+-----------+----------------+

Now let's walk through the query, line by line:

SELECT LOWER(word) AS word, word_count AS frequency,

corpus

All SQL queries that return data (and thus all BigQuery queries) start with

the word SELECT , indicating that you're selecting data out of a table. After

SELECT , you have comma-delimited field projections. These can be the

raw fields (like word_count ), computed values (like LOWER(word) , which

transforms word to lowercase), or aggregation functions (see the next

query). You can also decide what you want the name of the field to be

by using the AS keyword. For example, word_count AS frequency

renames word_count to frequency in the output. Fields with computed

values get assigned a unique field name, like f0_ , which you may have seen

in query results earlier in this chapter. In this query when we lowercased

the word field, we also assigned an alias, so that it would get a user-friendly

name, rather than f0_ .

Search WWH ::

Custom Search

Home