Database Reference
In-Depth Information
β ( βy / m + (1 − β )/ n )
(3) (1 − β )/ n , the share of the fraction 1 − β of the PageRank that belongs to t . This amount
is negligible and will be dropped to simplify the analysis.
Thus, from (1) and (2) above, we can write
We may solve the above equation for y , yielding
where c = β (1 − β )/(1 − β 2 ) = β /(1 + β ).
EXAMPLE 5.11 If we choose β = 0 . 85, then 1/(1 − β 2 ) = 3 . 6, and c = β /(1 + β ) = 0 . 46. That
is, the structure has amplified the external PageRank contribution by 360%, and also ob-
tained an amount of PageRank that is 46% of the fraction of the Web, m / n , that is in the
spam farm.
5.4.3
Combating Link Spam
It has become essential for search engines to detect and eliminate link spam, just as it was
necessary in the previous decade to eliminate term spam. There are two approaches to link
spam. One is to look for structures such as the spam farm in Fig. 5.16 , where one page links
to a very large number of pages, each of which links back to it. Search engines surely search
for such structures and eliminate those pages from their index. That causes spammers to
develop different structures that have essentially the same effect of capturing PageRank for
a target page or pages. There is essentially no end to variations of Fig. 5.16 , so this war
between the spammers and the search engines will likely go on for a long time.
However, there is another approach to eliminating link spam that doesn't rely on locating
the spam farms. Rather, a search engine can modify its definition of PageRank to lower the
rank of link-spam pages automatically. We shall consider two different formulas:
(1) TrustRank , a variation of topic-sensitive PageRank designed to lower the score of
spam pages.
(2) Spam mass , a calculation that identifies the pages that are likely to be spam and allows
the search engine to eliminate those pages or to lower their Page-Rank strongly.
Search WWH ::




Custom Search