Database Reference
In-Depth Information
against every row in the second table. The
ON
clause (required in BigQuery
SQL) is a kind of filter that keeps only those rows where fields in the first
table match fields in the second. BigQuery supports only equijoins, which is
a fancy way of saying that you can perform a
JOIN
only when the values on
the first table exactly match the values on the second. If the fields require
coercion, you can usually use a nested
SELECT
, which is introduced in query
#4, to make the values match up correctly.
When performing a
JOIN
, the
JOIN
clause follows the
FROM
clause:
FROM [publicdata:samples.shakespeare] AS shakespeare
JOIN [bigquery-e2e:reference.word_frequency] AS english
ON shakespeare.word = english.word
The
FROM
clause looks slightly different too—we've added a table alias.
Because you now have more than one table involved in your query, you need
to have a way to tell which table you're talking about when you reference
a field. To do this, you can qualify the field name with the table name. For
example, this query uses
english.count
to refer to the field
count
in the
table with the alias
english
.
The
JOIN
clause is a lot like a
FROM
clause; you provide the name of the table
you are joining against (or a nested
SELECT
statement, which is described
in the next query). BigQuery supports both
INNER
and
OUTER
joins, but the
description of these is deferred to chapter 10.
A feature that can come in handy when you join two large tables is the
EACH
keyword (which is not standard SQL). You can think of
EACH
as a hint to
the BigQuery query optimizer that tells it you have two large tables. Chapter
9 gives more background on how this optimization works, but for now just
remember that
JOIN EACH
can be useful for joining two large tables.
The results of this query are here:
+-----------+----------+
| word | rel_freq |
+-----------+----------+
| villany | 49.0 |
| pass'd | 34.0 |
| wrong'd | 31.0 |