MapReduce and the New Software Stack - Mining of Massive Datasets

Database Reference

In-Depth Information

pute node and not use MapReduce at all. Thus, is smaller than n 4 /q , and if q is close to its

minimum possible value of 2 n , 10 then the two-pass algorithm beats the one-pass algorithm

by a factor of in communication. Moreover, we can expect the difference in communic-

ation to be the significant cost difference. Both algorithms do the same O ( n 3 ) arithmetic

operations. The two-pass method naturally has more overhead managing tasks than does

the one-job method. On the other hand, the second pass of the two-pass algorithm applies

a Reduce function that is associative and commutative. Thus, it might be possible to save

some communication cost by using a combiner on that pass.

2.6.8

Exercises for Section 2.6

EXERCISE 2.6.1 Describe the graphs that model the following problems.

(a) The multiplication of an n × n matrix by a vector of length n .

(b) The natural join of R ( A, B ) and S ( B, C ), where A , B , and C have domains of sizes a , b ,

and c , respectively.

(c) The grouping and aggregation on the relation R ( A, B ), where A is the grouping attribute

and B is aggregated by the MAX operation. Assume A and B have domains of size a

and b , respectively.

! EXERCISE 2.6.2 Provide the details of the proof that a one-pass matrix-multiplication al-

gorithm requires replication rate at least r ≥ 2 n 2 /q , including:

(a) The proof that, for a fixed reducer size, the maximum number of outputs are covered

by a reducer when that reducer receives an equal number of rows of M and columns of

N .

(b) The algebraic manipulation needed, starting with

!! EXERCISE 2.6.3 Suppose our inputs are bit strings of length b , and the outputs correspond

to pairs of strings at Hamming distance 1. 11

(a) Prove that a reducer of size q can cover at most ( q/ 2) log 2 q outputs.

(b) Use part (a) to show the lower bound on replication rate: r ≥ b/ log 2 q .

(c) Show that there are algorithms with replication rate as given by part (b) for the cases q

= 2, q = 2 b , and q = 2 b/ 2 .

Search WWH ::

Custom Search

Home