Search Engine Performance Comparisons - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

4. ESL Type 4 requests indicate that the user wants to examine one-tenth of all relevant documents

and how many irrelevant documents the user has to examine in order to achieve this goal. In this

case, all relevant documents in the returned set of 200 have to be identified before the 10 percent

can be counted. On average AlatVista would have to examine about 8 irrelevant documents before

reaching the goal, while it only takes MARS fewer than one irrelevant documents.

5. ESL Type 5 requests examine up to a certain number of relevant documents. The example quoted

in Cooper's paper (Cooper 1968) was five. For AltaVista, it takes about 26 irrelevant documents

to find five relevant documents, while MARS requires only about 17.

goals and Metrics of the study

Since the early days of search engines in early 1990s, relatively few performance studies about search

engines have been available to the public. Researchers and engineers at Google published a few papers

about their systems with some mention of the performance (Ghemawat et.al. , 1999; Barroso et.al. , 2003).

Most other performance comparisons come as news reports from users' perceptions, that is, how satis-

fied the users feel about a particular search engine. The goal of this study is to assess the performance

of MSE from a user's point of view with collected statistics. The study is trying to answer the follow-

ing questions. How long would it take for a search engine to respond to a user query? How many total

relevant results are there from a search engine's point of view? Given that a typical user cannot examine

all returned results, which is typically in the order of millions, how many of the top-20 results returned

by a search engine are actually relevant to the query from a user's point of view? We also compare

the performance of search engines in these respects. The search engines involved in the study include

Microsoft Search Engine (beta version) (MSE, 2005), AlltheWeb (ATW, 2008), Google (Google, 2008),

Vivisimo (Vivisimo, 2008), and Yahoo! (Yahoo, 2008).

A number of performance metrics were measured in this study. The average response time is a mea-

sure of duration between the time when a query is issued and the time when the response is received,

as seen by the user's computer. Since a query typically retrieves hundreds and thousands of pages, we

simply measure, separately, the response time for the first page of URLs (typically 10 URLs), and then

the following four pages of URLs. The reason for the separation between the first and the rest of the

pages comes from the fact that it takes much more time to generate the first page than the rest of the

pages. The second piece of statistics collected is the number of relevant URLs per query posted by the

search engines. Although this is not necessarily a measure of how accurate the search results are, nor

a measure of how large the collected data for the search engines is, it is an interesting indication of the

data set kept by a search engine. The third measurement is a user-perceived relevance measure for the

queries. The authors sent 27 randomly chosen queries to MSE and the other peer search engines, the

relevance of the first 20 returned results from each of the search engines is examined manually. The

single value measurement RankPower (see discussion in Section previous section) is used to compare

the performance of the selected search engines from an end-user point of view.

Search WWH ::

Custom Search

Home