Database Reference
In-Depth Information
pute node and not use MapReduce at all. Thus, is smaller than
n
4
/q
, and if
q
is close to its
by a factor of in communication. Moreover, we can expect the difference in communic-
ation to be the significant cost difference. Both algorithms do the same
O
(
n
3
) arithmetic
operations. The two-pass method naturally has more overhead managing tasks than does
the one-job method. On the other hand, the second pass of the two-pass algorithm applies
a Reduce function that is associative and commutative. Thus, it might be possible to save
some communication cost by using a combiner on that pass.
2.6.8
Exercises for Section 2.6
EXERCISE
2.6.1 Describe the graphs that model the following problems.
(a) The multiplication of an
n
×
n
matrix by a vector of length
n
.
(b) The natural join of
R
(
A, B
) and
S
(
B, C
), where
A
,
B
, and
C
have domains of sizes
a
,
b
,
and
c
, respectively.
(c) The grouping and aggregation on the relation
R
(
A, B
), where
A
is the grouping attribute
and
B
is aggregated by the MAX operation. Assume
A
and
B
have domains of size
a
and
b
, respectively.
!
EXERCISE
2.6.2 Provide the details of the proof that a one-pass matrix-multiplication al-
gorithm requires replication rate at least
r
≥ 2
n
2
/q
, including:
(a) The proof that, for a fixed reducer size, the maximum number of outputs are covered
by a reducer when that reducer receives an equal number of rows of
M
and columns of
N
.
(b) The algebraic manipulation needed, starting with
!!
EXERCISE
2.6.3 Suppose our inputs are bit strings of length
b
, and the outputs correspond
to pairs of strings at Hamming distance 1.
11
(a) Prove that a reducer of size
q
can cover at most (
q/
2) log
2
q
outputs.
(b) Use part (a) to show the lower bound on replication rate:
r
≥
b/
log
2
q
.
(c) Show that there are algorithms with replication rate as given by part (b) for the cases
q
= 2,
q
= 2
b
, and
q
= 2
b/
2
.