Search Engine Performance Comparisons - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

eXPeri Ment Methods

Two separate sets of experiments were conducted to study the performance of MSE and the other chosen

peer search engines. The first set of experiments collects the statistics about search quality such as the

relevance of the returned search results. A list of 27 randomly chosen queries is sent to the five search

engines. The first 20 returned URLs (typically in two pages) are examined manually by the authors.

The authors determine, before sending each query to the search engine, what types of URLs are deemed

relevant. Hence, when the results are returned, only those URLs that are relevant to the pre-determined

interpretation of the query are considered as relevant. For example, when querying “thread”, we wanted to

see the Web pages relating to the thread programming commonly seen in computer science, not “thread”

seen as in textile industry. The number of relevant URLs and their places in the returned list (ranks)

are recorded. The average rank and RankPower are computed. This experiment took place between

November and December of 2004 for AlltheWeb, Google, Vivisimo, and Yahoo!. The data collection

for MSE took place in December of 2004 and January of 2005.

The second set of experiments examined some “hard” statistics, which include the average response

time and the number of relevant URLs from the search engine's point of view. To obtain these statistics,

a set of client programs are developed in Java, one for each of the search engines. The client program

can send queries to a search engine automatically. The duration between the time when query is sent and

the time when the responses are received from a search engine is recorded using Java's System.current-

TimeMillis() method which reports the wall-clock time in milli-second resolution. The client program

runs three times a day for a few days. Each time when a client program is running, four queries are

sent to a search engine in sequence. The average response times for the first five pages are computed.

Because it typically takes longer for a search engine to respond to the query the first time (the first page

return), the statistics for the first returned page is collected separated from the rest pages. We ran this set

of experiments with three of the five studied search engines, AlltheWeb, MSE, and Yahoo!. We did not

run the experiment for Google because it does not respond to programmed queries through the browser

interface. Although Google provides a nice set of APIs (Google API, 2005) to query its data collection

directly, the information provided through the API is not exactly the same as that through the browser

Figure 1. Yahoo! returning page showing 36,500,000 relevant results to the query “thread” and a pro-

cessing time of 0.10 seconds

Search WWH ::

Custom Search

Home