Databases Reference
In-Depth Information
An alternative strategy is based on partitioning the matrix into k 2 blocks,
while the vectors are still partitioned into k stripes. A picture, showing the
division for k = 4, is in Fig. 5.12. Note that we have not shown the multiplica-
tion of the matrix by β or the addition of (1−β)e/n, because these steps are
straightforward, regardless of the strategy we use.
v
v
'
M
M 12
M
M
1
1
11
13
14
v
v
'
M
M
M
M
2
2
21
22
23
24
=
v
v
'
M
M
M
M
3
3
31
32
33
34
v
v
'
M
M
M
M
4
4
41
42
43
44
Figure 5.12: Partitioning a matrix into square blocks
In this method, we use k 2 Map tasks. Each task gets one square of the
matrix M , say M ij , and one stripe of the vector v, which must be v j . Notice
that each stripe of the vector is sent to k different Map tasks; v j is sent to the
task handling M ij for each of the k possible values of i. Thus, v is transmitted
over the network k times. However, each piece of the matrix is sent only once.
Since the size of the matrix, properly encoded as described in Section 5.2.1, can
be expected to be several times the size of the vector, the transmission cost is
not too much greater than the minimum possible. And because we are doing
considerable combining at the Map tasks, we save as data is passed from the
Map tasks to the Reduce tasks.
The advantage of this approach is that we can keep both the jth stripe of
v and the ith stripe of v in main memory as we process M ij . Note that all
terms generated from M ij and v j contribute to v i
and no other stripe of v .
5.2.4
Representing Blocks of the Transition Matrix
Since we are representing transition matrices in the special way described in
Section 5.2.1, we need to consider how the blocks of Fig. 5.12 are represented.
Unfortunately, the space required for a column of blocks (a “stripe” as we called
it earlier) is greater than the space needed for the stripe as a whole, but not
too much greater.
For each block, we need data about all those columns that have at least one
nonzero entry within the block. If k, the number of stripes in each dimension,
is large, then most columns will have nothing in most blocks of its stripe. For
a given block, we not only have to list those rows that have a nonzero entry for
that column, but we must repeat the out-degree for the node represented by
the column. Consequently, it is possible that the out-degree will be repeated as
many times as the out-degree itself. That observation bounds from above the
Search WWH ::




Custom Search