Database Reference
In-Depth Information
5.7 References for Chapter 5
Web, which we used to justify the existence of dead ends and spider traps, were described
of spam mass is taken from [
3
].
[1] S. Brin and L. Page, “Anatomy of a large-scale hypertextual web search engine,”
Proc. 7th Intl. World-Wide-Web
Conference
, pp. 107-117, 1998.
[2] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Weiner, “Graph
structure in the web,”
Computer Networks
33:1-6, pp. 309-320, 2000.
[3] Z. Gyöngi, P. Berkhin, H. Garcia-Molina, and J. Pedersen, “Link spam detection based on mass estimation,”
Proc.
32nd Intl. Conf. on Very Large Databases
, pp. 439-450, 2006.
[4] Z. Gyöngi, H. Garcia-Molina, and J. Pedersen, “Combating link spam with trustrank,”
Proc. 30th Intl. Conf. on
Very Large Databases
, pp. 576-587, 2004.
[5] T.H. Haveliwala, “Efficient computation of PageRank,” Stanford Univ. Dept. of Computer Science technical re-
port, Sept., 1999. Available as
[6] T.H. Haveliwala, “Topic-sensitive PageRank,”
Proc. 11th Intl. World-Wide
-
Web Conference
, pp. 517-526, 2002
[7] J.M. Kleinberg, “Authoritative sources in a hyperlinked environment,”
J. ACM
46:5, pp. 604-632, 1999.
1
Link spammers sometimes try to make their unethicality less apparent by referring to what they do as “search-engine
optimization.”
2
The term PageRank comes from Larry Page, the inventor of the idea and a founder of Google.
3
They are so called because the programs that crawl the Web, recording pages and links, are often referred to as
“spiders.” Once a spider enters a spider trap, it can never leave.
4
You might suppose that the entire out-component and all the tendrils will be removed, but remember that they can
have within them smaller strongly connected components, including spider traps, which cannot be deleted.
5
Because
M
is not sparse, this representation is not very useful for
M
. However, the example illustrates the process of
representing matrices in general, and the sparser the matrix is, the more this representation will save.
7
Technically, the condition for this method to work is more restricted than simply “strongly connected.” However, the
other necessary conditions will surely be met by any large strongly connected component of the Web that was not arti-
ficially constructed.