Database Reference
In-Depth Information
same shard. This enables the JOIN operation to proceed in the shards and
in parallel, in a method similar to the broadcast JOIN .
Now that we've discussed the basic operation, we can consider an example,
using the same query as in the broadcast JOIN example, but with JOIN
EACH instead of JOIN . Note that this query doesn't need to use a shuffled
JOIN , but it is useful to do so to compare its execution with the broadcast
JOIN version. Here is the query:
SELECT wiki.title
FROM [publicdata:samples.wikipedia] as wiki
JOIN EACH(
SELECT word
FROM [publicdata:samples.shakespeare]
GROUP EACH BY word) as shakes
ON wiki.title = shakes.word
The steps involved to execute this query are:
1. The mixer receives the query and plans the execution. This query
requires five phases:
a. The Shakespeare table on the right must be shuffled prior to the
GROUP BY .
b. The GROUP BY subquery on the right must be run to compute the
distinct Shakespeare words.
c. The right table must be shuffled prior to the JOIN .
d. The left table must be shuffled prior to the JOIN .
e. Finally, the JOIN results can be computed in the shards.
To start things off, the mixer dispatches the first part of the query (the
shuffle of the Shakespeare table) to the shards.
2. The shards receive the request to shuffle the
publicdata:samples.shakespeare table into 100 partitions based
on the word field. Each shard computes a hash key for each word value
and sends the hashed row to a different shard indicated by the hash. For
the word “peace” for example, the hash function returns the value
7146922576. This would get sent to shard 76, because there are 100
Search WWH ::




Custom Search