Database Reference
In-Depth Information
GROUP
EACH
BY title
ORDER BY cnt DESC
LIMIT 100
JOIN EACH
If you don't specify
EACH
when you perform a
JOIN
operation, BigQuery
requires that the table on the right side of the join be small enough to send
the entire thing to every node of the computation tree. This means that if
the table on the right side of the join doesn't fit into less than 8 MB when
compressed, your query will fail. For example, if you take the query from the
OUTER JOIN
example and reverse the left and right tables and switch to an
INNER JOIN
, you get:
SELECT github.repository_name,
github.repository_owner,
shakespeare.corpus_date
FROM (
SELECT corpus, max(corpus_date) AS corpus_date
FROM [publicdata:samples.shakespeare]
GROUP BY corpus) shakespeare
INNER JOIN [publicdata:samples.github_timeline] github
ON shakespeare.corpus = github.repository_name
This query fails with the error “The JOIN operator's right-side table must
be a small table.” This isn't as big of a limitation as it sounds because many
JOIN
queries use a subselect to narrow down the table that is joined against;
as long as the computed result is smaller than 8 MB, the
JOIN
will succeed.
However, when that is not possible you can fix the query by adding the
EACH
keyword to the join:
SELECT github.repository_name,
github.repository_owner,
shakespeare.corpus_date
FROM (
SELECT corpus, max(corpus_date) AS corpus_date
FROM [publicdata:samples.shakespeare]
GROUP BY corpus) shakespeare
INNER JOIN
EACH
[publicdata:samples.github_timeline]