Database Reference
In-Depth Information
shuffle partitions and 7146922576 % 100 = 76. After the shuffle is
complete, the shards return to the mixer indicating they are finished.
(Note that they do not return any results at this point.) The temporary
table with the results is given the name
__table1
.
3. Once the Shakespeare table rows have been sorted by
word
value into
distinct partitions, the
GROUP BY
operation from the
RIGHT
subquery
can be done entirely in the shards. The shards receive the query
SELECT
word FROM [__table1] GROUP BY word
. The results of this
subuery are stored in table
__table2
.
4. The mixer now kicks off a new shuffle on the left table by instructing the
shards to shuffle the
[publicdata:samples.wikipedia]
table into
500 partitions based on the
title
field. The results are saved in a
temporary table called
__table3
.
5. Now that all the preparatory work is done, the two temporary tables can
be joined. Because the
LEFT
and
RIGHT
tables have been sorted by the
field they are being joined by, the work of the
JOIN
can be done in
parallel on the shards. The mixer sends
SELECT wiki.title FROM
__table3 AS wiki JOIN __table2 AS shakes ON
wiki.title = shakes. word
to the shards. The
__table2
and
__table3
tables are the outputs of the shuffle phases that have been
patiently waiting to be joined. After computing the
JOIN
, the results are
returned to the mixer.
6. The mixer doesn't need to do any aggregation in this query, so it just
returns all the results to the caller.