Database Reference
In-Depth Information
is read k times, where the parameter k is chosen so that 1/ k th of the vectors v and v ′ can be
held in main memory. Recall that the algorithm of Section 5.2.3 uses k 2 processors, assum-
ing all Map tasks are executed in parallel at different processors.
We can assign all the blocks in one row of blocks to a single Map task, and thus reduce
the number of Map tasks to k . For instance, in Fig. 5.12 , M 11 , M 12 , M 13 , and M 14 would
be assigned to a single Map task. If we represent the blocks as in Fig. 5.14 , we can read
the blocks in a row of blocks one-at-a-time, so the matrix does not consume a significant
amount of main-memory. At the same time that we read M ij , we must read the vector stripe
v j . As a result, each of the k Map tasks reads the entire vector v , along with 1/ k th of the
matrix.
The work reading M and v is thus the same as for the algorithm of Section 5.2.3 , but the
advantage of this approach is that each Map task can combine all the terms for the portion
for which it is exclusively responsible. In other words, the Reduce tasks have nothing to
do but to concatenate the pieces of v ′ received from the k Map tasks.
We can extend this idea to an environment in which MapReduce is not used. Suppose
we have a single processor, with M and v stored on its disk, using the same sparse repres-
entation for M that we have discussed. We can first simulate the first Map task, the one that
uses blocks M 11 through M 1 k and all of v to compute Then we simulate the second Map
task, reading M 21 through M 2 k and all of v to compute and so on. As for the previous al-
gorithms, we thus read M once and v k times. We can make k as small as possible, subject
to the constraint that there is enough main memory to store 1/ k th of v and 1/ k th of v ′, along
with as small a portion of M as we can read from disk (typically, one disk block).
5.2.6
Exercises for Section 5.2
EXERCISE 5.2.1 Suppose we wish to store an n × n boolean matrix (0 and 1 elements only).
We could represent it by the bits themselves, or we could represent the matrix by listing the
positions of the 1s as pairs of integers, each integer requiring log 2 n bits. The former is
suitable for dense matrices; the latter is suitable for sparse matrices. How sparse must the
matrix be (i.e., what fraction of the elements should be 1s) for the sparse representation to
save space?
EXERCISE 5.2.2 Using the method of Section 5.2.1 , represent the transition matrices of the
following graphs:
(a) Figure 5.4 .
(b) Figure 5.7 .
Search WWH ::




Custom Search