Database Reference
In-Depth Information
The spam farm consists of the spammer's own pages, organized in a special way as seen
on the right, and some links from the accessible pages to the spammer's pages. Without
some links from the outside, the spam farm would be useless, since it would not even be
crawled by a typical search engine.
Concerning the accessible pages, it might seem surprising that one can affect a page
without owning it. However, today there are many sites, such as blogs or newspapers that
invite others to post their comments on the site. In order to get as much PageRank flowing
to his own pages from outside, the spammer posts many comments such as “I agree. Please
see my article at www.mySpamFarm.com .
In the spam farm, there is one page t , the target page , at which the spammer attempts to
place as much PageRank as possible. There are a large number m of supporting pages, that
accumulate the portion of the PageRank that is distributed equally to all pages (the fraction
1− β of the PageRank that represents surfers going to a random page). The supporting pages
also prevent the Page-Rank of t from being lost, to the extent possible, since some will be
taxed away at each round. Notice that t has a link to every supporting page, and every sup-
porting page links only to t .
5.4.2
Analysis of a Spam Farm
Suppose that PageRank is computed using a taxation parameter β , typically around 0.85.
That is, β is the fraction of a page's PageRank that gets distributed to its successors at the
next round. Let there be n pages on the Web in total, and let some of them be a spam farm
of the form suggested in Fig. 5.16 , with a target page t and m supporting pages. Let x be
the amount of PageRank contributed by the accessible pages. That is, x is the sum, over all
accessible pages p with a link to t , of the PageRank of p times β , divided by the number of
successors of p . Finally, let y be the unknown PageRank of t . We shall solve for y .
First, the PageRank of each supporting page is
βy / m + (1 − β )/ n
The first term represents the contribution from t . The PageRank y of t is taxed, so only βy
is distributed to t s successors. That PageRank is divided equally among the m supporting
pages. The second term is the supporting page's share of the fraction 1 − β of the PageRank
that is divided equally among all pages on the Web.
Now, let us compute the PageRank y of target page t . Its PageRank comes from three
sources:
(1) Contribution x from outside, as we have assumed.
(2) β times the PageRank of every supporting page; that is,
Search WWH ::




Custom Search