Link Analysis - Mining of Massive Datasets - page 164

Databases Reference

In-Depth Information

Source

Degree

Destinations

A

3

B

B

2

A

(a) Representation of M 11 connecting A and B to A and B

Source

Degree

Destinations

C

1

A

D

2

B

(b) Representation of M 12 connecting C and D to A and B

Source

Degree

Destinations

A

3

C, D

B

2

D

(c) Representation of M 21 connecting A and B to C and D

Source

Degree

Destinations

D

2

C

(d) Representation of M 22 connecting C and D to C and D

Figure 5.14: Sparse representation of the blocks of a matrix

The work reading M and v is thus the same as for the algorithm of Sec-

tion 5.2.3, but the advantage of this approach is that each Map task can combine

all the terms for the portion v ′ i for which it is exclusively responsible. In other

words, the Reduce tasks have nothing to do but to concatenate the pieces of v ′

received from the k Map tasks.

We can extend this idea to an environment in which map-reduce is not used.

Suppose we have a single processor, with M and v stored on its disk, using the

same sparse representation for M that we have discussed. We can first simulate

the first Map task, the one that uses blocks M 11 through M 1k and all of v to

compute v ′ 1 . Then we simulate the second Map task, reading M 21 through M 2k

and all of v to compute v ′ 2 , and so on. As for the previous algorithms, we thus

read M once and v k times. We can make k as small as possible, subject to the

constraint that there is enough main memory to store 1/kth of v and 1/kth of

v ′ , along with as small a portion of M as we can read from disk (typically, one

disk block).

Next Page

Mining of Massive Datasets

Search WWH ::

Custom Search

Home