Database Reference
In-Depth Information
Figure 9.3 Broadcast JOIN
You may see why there is a size limit for broadcast JOIN . First, the entire
hash table needs to fit in memory, so it is constrained by memory limitations
on the worker. Furthermore, the hash table also needs to be broadcast to
every shard in the Dremel cluster. If there are 5,000 shards and you have an
8 MB table, you've just sent 40 GB of traffic across the network.
Now that you've seen how broadcast JOIN works, we can walk through an
example:
SELECT wiki.title
FROM [publicdata:samples.wikipedia] AS wiki
JOIN (
SELECT word
FROM [publicdata:samples.shakespeare]
GROUP BY word
) AS shakes
ON wiki.title = shakes.word
This query joins the Wikipedia sample table with the Shakespeare sample
table. The Wikipedia table has one row for every update made to any
Wikipedia page. The only field we use in this query is title , which is the
title of the page that was edited. The query returns one row per revision to
each Wikipedia entry whose title is a word in a Shakespeare play.
The Wikipedia table is “large,” so it must go on the LEFT side of the JOIN .
The Shakespeare table is small, but it is made even smaller by performing
 
Search WWH ::




Custom Search