Information Technology Reference
In-Depth Information
Fig. 1.3 Illustration of
PageRank
page uniformly. This small probability can be represented by ( 1
α) , where α is
called the damping factor. Accordingly, the PageRank model is refined as follows 9 :
α
d v
PR(d v )
U(d v ) +
( 1
α)
PR(d u )
=
,
(1.5)
N
B u
where N is the total number of pages on the Web.
The above process of computing PageRank can be vividly illustrated by Fig. 1.3 .
By using the above formula, it is not difficult to discover that the PageRank values
of the seven nodes in the graph are 0.304, 0.166, 0.141, 0.105, 0.179, 0.045, and
0.060, respectively. The sizes of the circles in the figure are used to represent the
PageRank values of the nodes.
Many algorithms have been developed in order to further improve the accuracy
and efficiency of PageRank. Some work focuses on the speed-up of the computation
[ 1 , 32 , 51 ], while others focus on the refinement and enrichment of the model. For
example, topic-sensitive PageRank [ 35 ] and query-dependent PageRank [ 64 ]intro-
duce topics and assume that the endorsement from a page belonging to the same
topic is larger than that from a page belonging to a different topic. Other variations
of PageRank include those modifying the 'personalized vector' [ 34 ], changing the
'damping factor' [ 6 ], and introducing inter-domain and intra-domain link weights
[ 46 ]. Besides, there is also some work on the theoretic issues of PageRank [ 5 , 33 ].
Langville et al. [ 46 ] provide a good survey on PageRank and its related work.
Algorithms that can generate robust importance ranking against link spam have
also been proposed. For example, TrustRank [ 29 ] is an importance ranking algo-
rithm that takes into consideration the reliability of web pages when calculating the
importance of pages. In TrustRank, a set of reliable pages are first identified as seed
pages. Then the trust of a seed page is propagated to other pages on the web link
graph. Since the propagation in TrustRank starts from reliable pages, TrustRank can
be more spam-resistant than PageRank.
9 If there are web pages without any inlinks (which is usually referred to as dangling nodes in the
graph), some additional heuristics is needed to avoid rank leak.
Search WWH ::




Custom Search