Database Reference
In-Depth Information
will be M ( M v 0 ) = M 2 v 0 , and so on. In general, multiplying the initial vector v 0 by M a total
of i times will give us the distribution of the surfer after i steps.
To see why multiplying a distribution vector v by M gives the distribution x = M v at the
next step, we reason as follows. The probability x i that a random surfer will be at node i
at the next step, is ∑ j m ij v j . Here, m ij is the probability that a surfer at node j will move to
node i at the next step (often 0 because there is no link from j to i ), and v j is the probability
that the surfer was at node j at the previous step.
This sort of behavior is an example of the ancient theory of Markov processes . It is
known that the distribution of the surfer approaches a limiting distribution v that satisfies v
= M v , provided two conditions are met:
(1) The graph is strongly connected ; that is, it is possible to get from any node to any other
node.
(2) There are no dead ends : nodes that have no arcs out.
Note that Fig. 5.1 satisfies both these conditions.
The limit is reached when multiplying the distribution by M another time does not
change the distribution. In other terms, the limiting v is an eigenvector of M (an eigenvector
of a matrix M is a vector v that satisfies v = λ M v for some constant eigenvalue λ). In fact,
because M is stochastic , meaning that its columns each add up to 1, v is the principal eigen-
vector (its associated eigenvalue is the largest of all eigenvalues). Note also that, because
M is stochastic, the eigenvalue associated with the principal eigenvector is 1.
The principal eigenvector of M tells us where the surfer is most likely to be after a long
time. Recall that the intuition behind PageRank is that the more likely a surfer is to be at
a page, the more important the page is. We can compute the principal eigenvector of M by
starting with the initial vector v 0 and multiplying by M some number of times, until the
vector we get shows little change at each round. In practice, for the Web itself, 50-75 iter-
ations are sufficient to converge to within the error limits of double-precision arithmetic.
EXAMPLE 5.2 Suppose we apply the process described above to the matrix M from
Example 5.1 . Since there are four nodes, the initial vector v 0 has four components, each
1/4. The sequence of approximations to the limit that we get by multiplying at each step by
M is:
Notice that in this example, the probabilities for B , C , and D remain the same. It is easy
to see that B and C must always have the same values at any iteration, because their rows in
M are identical. To show that their values are also the same as the value for D , an inductive
Search WWH ::




Custom Search