Database Reference
In-Depth Information
will be
M
(
M
v
0
) =
M
2
v
0
, and so on. In general, multiplying the initial vector
v
0
by
M
a total
of
i
times will give us the distribution of the surfer after
i
steps.
To see why multiplying a distribution vector
v
by
M
gives the distribution
x
=
M
v
at the
next step, we reason as follows. The probability
x
i
that a random surfer will be at node
i
at the next step, is ∑
j
m
ij
v
j
. Here,
m
ij
is the probability that a surfer at node
j
will move to
node
i
at the next step (often 0 because there is no link from
j
to
i
), and
v
j
is the probability
that the surfer was at node
j
at the previous step.
This sort of behavior is an example of the ancient theory of
Markov processes
. It is
known that the distribution of the surfer approaches a limiting distribution
v
that satisfies
v
=
M
v
, provided two conditions are met:
(1) The graph is
strongly connected
; that is, it is possible to get from any node to any other
node.
(2) There are no
dead ends
: nodes that have no arcs out.
Note that
Fig. 5.1
satisfies both these conditions.
The limit is reached when multiplying the distribution by
M
another time does not
change the distribution. In other terms, the limiting
v
is an eigenvector of
M
(an
eigenvector
of a matrix
M
is a vector
v
that satisfies
v
= λ
M
v
for some constant
eigenvalue
λ). In fact,
because
M
is
stochastic
, meaning that its columns each add up to 1,
v
is the
principal
eigen-
vector (its associated eigenvalue is the largest of all eigenvalues). Note also that, because
M
is stochastic, the eigenvalue associated with the principal eigenvector is 1.
The principal eigenvector of
M
tells us where the surfer is most likely to be after a long
time. Recall that the intuition behind PageRank is that the more likely a surfer is to be at
a page, the more important the page is. We can compute the principal eigenvector of
M
by
starting with the initial vector
v
0
and multiplying by
M
some number of times, until the
vector we get shows little change at each round. In practice, for the Web itself, 50-75 iter-
ations are sufficient to converge to within the error limits of double-precision arithmetic.
EXAMPLE
5.2 Suppose we apply the process described above to the matrix
M
from
1/4. The sequence of approximations to the limit that we get by multiplying at each step by
M
is:
Notice that in this example, the probabilities for
B
,
C
, and
D
remain the same. It is easy
to see that
B
and
C
must always have the same values at any iteration, because their rows in
M
are identical. To show that their values are also the same as the value for
D
, an inductive