Information Technology Reference
In-Depth Information
Chapter 1
Introduction
Abstract In this chapter, we give a brief introduction to learning to rank for infor-
mation retrieval. Specifically, we first introduce the ranking problem by taking doc-
ument retrieval as an example. Second, conventional ranking models proposed in
the literature of information retrieval are reviewed, and widely used evaluation mea-
sures for ranking are mentioned. Third, the motivation of using machine learning
technology to solve the problem of ranking is given, and existing learning-to-rank
algorithms are categorized and briefly depicted.
1.1 Overview
With the fast development of the Web, every one of us is experiencing a flood of
information. A study 1 conducted in 2005 estimated the World Wide Web to contain
11.5 billion pages by January 2005. In the same year, Yahoo! 2 announced that its
search engine index contained more than 19.2 billion documents. It was estimated
by http://www.worldwidewebsize.com/ that there were about 25 billion pages in-
dexed by major search engines as of October 2008. Recently, the Google blog 3
reported that about one trillion web pages have been seen during their crawling and
indexing. According to the above information, we can see that the number of web-
pages is growing very fast. Actually, the same story also happens to the number of
websites. According to a report, 4
the evolution of websites from 2000 to 2007 is
shown in Fig. 1.1 .
The extremely large size of the Web makes it generally impossible for common
users to locate their desired information by browsing the Web. As a consequence,
efficient and effective information retrieval has become more important than ever,
and the search engine (or information retrieval system) has become an essential tool
for many people.
A typical search engine architecture is shown in Fig. 1.2 . As can be seen from the
figure, there are in general six major components in a search engine: crawler, parser,
Search WWH ::




Custom Search