Database Reference
In-Depth Information
(1) Prove an upper bound on how many outputs a reducer with q inputs can cover. Call
this bound g ( q ). This step can be difficult, but for examples like similarity join, it is
actually quite simple.
(2) Determine the total number of outputs produced by the problem.
(3) Suppose that there are k reducers, and the i th reducer has q i < q inputs. Observe that
must be no less than the number of outputs computed in step (2).
(4) Manipulate the inequality from (3) to get a lower bound on Often, the trick used
at this step is to replace some factors of q i by their upper bound q , but leave a single
factor of q i in the term for i .
(5) Since is the total communication from Map tasks to Reduce tasks, divide the upper
bound from (4) on this quantity by the number of inputs. The result is the replication
rate.
EXAMPLE 2.17 This sequence of steps may seem mysterious, but let us consider the simil-
arity join as an example that we hope will make things clear. Recall that in Example 2.15
we gave an upper bound on the replication rate r of r ≤ 2 p/q , where p was the number of in-
puts and q was the reducer size. We shall show a lower bound on r that is half that amount,
which implies that, although improvements to the algorithm might be possible, any reduc-
tion in communication for a given reducer size will be by a factor of 2 at most.
For step (1), observe that if a reducer gets q inputs, it cannot cover more than or
approximately q 2 / 2 outputs. For step (2), we know there are a total of or approximately
p 2 / 2 outputs that each must be covered. The inequality constructed at step (3) is thus
or, multiplying both sides by 2,
(2.1)
Now, we must do the manipulation of step (4). Following the hint, we note that there are
two factors of q i in each term on the left of Equation ( 2.1 ), so we replace one factor by q
and leave the other as q i . Since q q i , we can only increase the left side by doing so, and
thus the inequality continues to hold:
or, dividing by q :
Search WWH ::




Custom Search