Database Reference
In-Depth Information
JOIN Key CARTESIAN-Product Explosion
One aspect of SQL JOIN s that often confuses people who are relatively
new to using them is what happens when there are multiple matches on
the left and the right. For instance, the following query joins the GitHub
and Shakespeare tables where the Shakespeare corpus name matches
the GitHub repository name and the Shakespeare corpus is The
Tempest :
SELECT shakespeare.corpus
FROM [publicdata:samples.github_timeline] github
JOIN [publicdata:samples.shakespeare] shakespeare
ON github.repository_name = shakespeare.corpus
WHERE shakespeare.corpus = "tempest"
There are 146 rows in the GitHub table where the repository name is
“tempest” and 3636 rows in the Shakespeare table where the corpus
name is the same. So when we join these two tables by matching the
repository name with the corpus, we get 146 * 3636 which is more than
one-half of a million rows.
It is easy to see that if you're not careful, when you run a JOIN
operation, you can get unintended explosions in the results. Imagine
joining a table of customers with a table of orders on the
customer_id . If both tables have a significant number of frequently
occurring values, the join could end up producing billions of results by
matching the commonly occurring value in one table with each instance
of the value in the other table. If you run a JOIN operation and
BigQuery tells you the results are too large to return, JOIN key
explosion may be the culprit.
Analytic and Windowing Functions
There are some data transformations that are awkward to express in SQL.
A good example is computing the rank of a record within a given group of
records. The basic problem is that the tuple relational calculus, the formal
framework underlying SQL, does not actually support ordered sequences
Search WWH ::




Custom Search