Database Reference
In-Depth Information
passed from Map tasks to Reduce tasks. In this case, the striping approach discussed
in Section 2.3.1 is not sufficient to avoid heavy use of disk (thrashing).
We discuss the solution to these two problems in this section.
5.2.1
Representing Transition Matrices
The transition matrix is very sparse, since the average Web page has about 10 out-links. If,
say, we are analyzing a graph of ten billion pages, then only one in a billion entries is not 0.
The proper way to represent any sparse matrix is to list the locations of the nonzero entries
and their values. If we use 4-byte integers for coordinates of an element and an 8-byte
double-precision number for the value, then we need 16 bytes per nonzero entry. That is,
the space needed is linear in the number of nonzero entries, rather than quadratic in the side
of the matrix.
However, for a transition matrix of the Web, there is one further compression that we can
do. If we list the nonzero entries by column, then we know what each nonzero entry is; it is
1 divided by the out-degree of the page. We can thus represent a column by one integer for
the out-degree, and one integer per nonzero entry in that column, giving the row number
where that entry is located. Thus, we need slightly more than 4 bytes per nonzero entry to
represent a transition matrix.
EXAMPLE 5.7 Let us reprise the example Web graph from Fig. 5.1 , whose transition matrix
is
Recall that the rows and columns represent nodes A , B , C , and D , in that order. In Fig. 5.11
is a compact representation of this matrix. 5
Figure 5.11 Represent a transition matrix by the out-degree of each node and the list of its successors
For instance, the entry for A has degree 3 and a list of three successors. From that row of
Fig. 5.11 we can deduce that the column for A in matrix M has 0 in the row for A (since it is
not on the list of destinations) and 1/3 in the rows for B , C , and D . We know that the value
is 1/3 because the degree column in Fig. 5.11 tells us there are three links out of A .
Search WWH ::




Custom Search