Database Reference
In-Depth Information
To avoid the problem illustrated by
Example 5.5
, we modify the calculation of PageRank
by allowing each random surfer a small probability of
teleporting
to a random page, rather
than following an out-link from their current page. The iterative step, where we compute a
new vector estimate of PageRanks
v
′ from the current PageRank estimate
v
and the trans-
ition matrix
M
is
v
′ =
βM
v
+ (1 −
β
)
e
/
n
where
β
is a chosen constant, usually in the range 0.8 to 0.9,
e
is a vector of all 1s with
the appropriate number of components, and
n
is the number of nodes in the Web graph.
The term
βM
v
represents the case where, with probability
β
, the random surfer decides to
follow an out-link from their present page. The term (1 −
β
)
e
/
n
is a vector each of whose
components has value (1 −
β
)/
n
and represents the introduction, with probability 1 −
β
, of
a new random surfer at a random page.
Note that if the graph has no dead ends, then the probability of introducing a new random
surfer is exactly equal to the probability that the random surfer will decide
not
to follow a
link from their current page. In this case, it is reasonable to visualize the surfer as deciding
either to follow a link or teleport to a random page. However, if there are dead ends, then
there is a third possibility, which is that the surfer goes nowhere. Since the term (1 −
β
)
e
/
n
does not depend on the sum of the components of the vector
v
, there will always be some
fraction of a surfer operating on the Web. That is, when there are dead ends, the sum of the
components of
v
may be less than 1, but it will never reach 0.
EXAMPLE
5.6 Let us see how the new approach to computing PageRank fares on the graph
of
Fig. 5.6
. We shall use
β
= 0
.
8 in this example. Thus, the equation for the iteration be-
comes
Notice that we have incorporated the factor
β
into
M
by multiplying each of its elements
by 4/5. The components of the vector (1 −
β
)
e
/
n
are each 1/20, since 1 −
β
= 1/5 and
n
= 4.
Here are the first few iterations:
By being a spider trap,
C
has managed to get more than half of the PageRank for
itself. However, the effect has been limited, and each of the nodes gets some of the
PageRank.
□