Database Reference
In-Depth Information
against every row in the second table. The ON clause (required in BigQuery
SQL) is a kind of filter that keeps only those rows where fields in the first
table match fields in the second. BigQuery supports only equijoins, which is
a fancy way of saying that you can perform a JOIN only when the values on
the first table exactly match the values on the second. If the fields require
coercion, you can usually use a nested SELECT , which is introduced in query
#4, to make the values match up correctly.
When performing a JOIN , the JOIN clause follows the FROM clause:
FROM [publicdata:samples.shakespeare] AS shakespeare
JOIN [bigquery-e2e:reference.word_frequency] AS english
ON shakespeare.word = english.word
The FROM clause looks slightly different too—we've added a table alias.
Because you now have more than one table involved in your query, you need
to have a way to tell which table you're talking about when you reference
a field. To do this, you can qualify the field name with the table name. For
example, this query uses english.count to refer to the field count in the
table with the alias english .
The JOIN clause is a lot like a FROM clause; you provide the name of the table
you are joining against (or a nested SELECT statement, which is described
in the next query). BigQuery supports both INNER and OUTER joins, but the
description of these is deferred to chapter 10.
A feature that can come in handy when you join two large tables is the EACH
keyword (which is not standard SQL). You can think of EACH as a hint to
the BigQuery query optimizer that tells it you have two large tables. Chapter
9 gives more background on how this optimization works, but for now just
remember that JOIN EACH can be useful for joining two large tables.
The results of this query are here:
+-----------+----------+
| word | rel_freq |
+-----------+----------+
| villany | 49.0 |
| pass'd | 34.0 |
| wrong'd | 31.0 |
Search WWH ::




Custom Search