Database Reference
In-Depth Information
5.7 References for Chapter 5
The PageRank algorithm was first expressed in [ 1 ] . The experiments on the structure of the
Web, which we used to justify the existence of dead ends and spider traps, were described
in [ 2 ] . The block-stripe method for performing the PageRank iteration is taken from [ 5 ] .
Topic-sensitive PageRank is taken from [ 6 ] . TrustRank is described in [ 4 ], and the idea
of spam mass is taken from [ 3 ].
The HITS (hubs and authorities) idea was described in [ 7 ] .
[1] S. Brin and L. Page, “Anatomy of a large-scale hypertextual web search engine,” Proc. 7th Intl. World-Wide-Web
Conference , pp. 107-117, 1998.
[2] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Weiner, “Graph
structure in the web,” Computer Networks 33:1-6, pp. 309-320, 2000.
[3] Z. Gyöngi, P. Berkhin, H. Garcia-Molina, and J. Pedersen, “Link spam detection based on mass estimation,” Proc.
32nd Intl. Conf. on Very Large Databases , pp. 439-450, 2006.
[4] Z. Gyöngi, H. Garcia-Molina, and J. Pedersen, “Combating link spam with trustrank,” Proc. 30th Intl. Conf. on
Very Large Databases , pp. 576-587, 2004.
[5] T.H. Haveliwala, “Efficient computation of PageRank,” Stanford Univ. Dept. of Computer Science technical re-
port, Sept., 1999. Available as
http://infolab.stanford.edu/~taherh/papers/efficient-
pr.pdf
[6] T.H. Haveliwala, “Topic-sensitive PageRank,” Proc. 11th Intl. World-Wide - Web Conference , pp. 517-526, 2002
[7] J.M. Kleinberg, “Authoritative sources in a hyperlinked environment,” J. ACM 46:5, pp. 604-632, 1999.
1 Link spammers sometimes try to make their unethicality less apparent by referring to what they do as “search-engine
optimization.”
2 The term PageRank comes from Larry Page, the inventor of the idea and a founder of Google.
3 They are so called because the programs that crawl the Web, recording pages and links, are often referred to as
“spiders.” Once a spider enters a spider trap, it can never leave.
4 You might suppose that the entire out-component and all the tendrils will be removed, but remember that they can
have within them smaller strongly connected components, including spider traps, which cannot be deleted.
5 Because M is not sparse, this representation is not very useful for M . However, the example illustrates the process of
representing matrices in general, and the sparser the matrix is, the more this representation will save.
6 This directory, found at www.dmoz.org , is a collection of human-classified Web pages.
7 Technically, the condition for this method to work is more restricted than simply “strongly connected.” However, the
other necessary conditions will surely be met by any large strongly connected component of the Web that was not arti-
ficially constructed.
Search WWH ::




Custom Search