Database Reference
In-Depth Information
This query returns the most commonly used words (lowercased) longer
than five letters, the count, and the corpus in which they appear, in any
of Shakespeare's plays with “king” in the title. It doesn't, however, count
frequency across plays; it just returns the count of the word per-play. Here
are the results:
+------------+-----------+----------------+
| word | frequency | corpus |
+------------+-----------+----------------+
| falstaff | 199 | 2kinghenryiv |
| prince | 192 | 1kinghenryiv |
| richard | 188 | kingrichardiii |
| gloucester | 182 | kingrichardiii |
| edward | 181 | 3kinghenryvi |
| falstaff | 168 | 1kinghenryiv |
| gloucester | 141 | kinglear |
| richard | 134 | kingrichardii |
| warwick | 122 | 3kinghenryvi |
| cardinal | 121 | kinghenryviii |
+------------+-----------+----------------+
Now let's walk through the query, line by line:
SELECT LOWER(word) AS word, word_count AS frequency,
corpus
All SQL queries that return data (and thus all BigQuery queries) start with
the word
SELECT
, indicating that you're selecting data out of a table. After
SELECT
, you have comma-delimited field projections. These can be the
raw fields (like
word_count
), computed values (like
LOWER(word)
, which
transforms
word
to lowercase), or aggregation functions (see the next
query). You can also decide what you want the name of the field to be
by using the
AS
keyword. For example,
word_count AS frequency
renames
word_count
to
frequency
in the output. Fields with computed
values get assigned a unique field name, like
f0_
, which you may have seen
in query results earlier in this chapter. In this query when we lowercased
the
word
field, we also assigned an alias, so that it would get a user-friendly
name, rather than
f0_
.