Link Analysis - Mining of Massive Datasets

Database Reference

In-Depth Information

is read k times, where the parameter k is chosen so that 1/ k th of the vectors v and v ′ can be

held in main memory. Recall that the algorithm of Section 5.2.3 uses k 2 processors, assum-

ing all Map tasks are executed in parallel at different processors.

We can assign all the blocks in one row of blocks to a single Map task, and thus reduce

the number of Map tasks to k . For instance, in Fig. 5.12 , M 11 , M 12 , M 13 , and M 14 would

be assigned to a single Map task. If we represent the blocks as in Fig. 5.14 , we can read

the blocks in a row of blocks one-at-a-time, so the matrix does not consume a significant

amount of main-memory. At the same time that we read M ij , we must read the vector stripe

v j . As a result, each of the k Map tasks reads the entire vector v , along with 1/ k th of the

matrix.

The work reading M and v is thus the same as for the algorithm of Section 5.2.3 , but the

advantage of this approach is that each Map task can combine all the terms for the portion

for which it is exclusively responsible. In other words, the Reduce tasks have nothing to

do but to concatenate the pieces of v ′ received from the k Map tasks.

We can extend this idea to an environment in which MapReduce is not used. Suppose

we have a single processor, with M and v stored on its disk, using the same sparse repres-

entation for M that we have discussed. We can first simulate the first Map task, the one that

uses blocks M 11 through M 1 k and all of v to compute Then we simulate the second Map

task, reading M 21 through M 2 k and all of v to compute and so on. As for the previous al-

gorithms, we thus read M once and v k times. We can make k as small as possible, subject

to the constraint that there is enough main memory to store 1/ k th of v and 1/ k th of v ′, along

with as small a portion of M as we can read from disk (typically, one disk block).

5.2.6

Exercises for Section 5.2

EXERCISE 5.2.1 Suppose we wish to store an n × n boolean matrix (0 and 1 elements only).

We could represent it by the bits themselves, or we could represent the matrix by listing the

positions of the 1s as pairs of integers, each integer requiring ⌈ log 2 n ⌉ bits. The former is

suitable for dense matrices; the latter is suitable for sparse matrices. How sparse must the

matrix be (i.e., what fraction of the elements should be 1s) for the sparse representation to

save space?

EXERCISE 5.2.2 Using the method of Section 5.2.1 , represent the transition matrices of the

following graphs:

(a) Figure 5.4 .

(b) Figure 5.7 .

Search WWH ::

Custom Search

Home