Information Technology Reference
In-Depth Information
eXPeri Ment Methods
Two separate sets of experiments were conducted to study the performance of MSE and the other chosen
peer search engines. The first set of experiments collects the statistics about search quality such as the
relevance of the returned search results. A list of 27 randomly chosen queries is sent to the five search
engines. The first 20 returned URLs (typically in two pages) are examined manually by the authors.
The authors determine, before sending each query to the search engine, what types of URLs are deemed
relevant. Hence, when the results are returned, only those URLs that are relevant to the pre-determined
interpretation of the query are considered as relevant. For example, when querying “thread”, we wanted to
see the Web pages relating to the thread programming commonly seen in computer science, not “thread”
seen as in textile industry. The number of relevant URLs and their places in the returned list (ranks)
are recorded. The average rank and RankPower are computed. This experiment took place between
November and December of 2004 for AlltheWeb, Google, Vivisimo, and Yahoo!. The data collection
for MSE took place in December of 2004 and January of 2005.
The second set of experiments examined some “hard” statistics, which include the average response
time and the number of relevant URLs from the search engine's point of view. To obtain these statistics,
a set of client programs are developed in Java, one for each of the search engines. The client program
can send queries to a search engine automatically. The duration between the time when query is sent and
the time when the responses are received from a search engine is recorded using Java's System.current-
TimeMillis() method which reports the wall-clock time in milli-second resolution. The client program
runs three times a day for a few days. Each time when a client program is running, four queries are
sent to a search engine in sequence. The average response times for the first five pages are computed.
Because it typically takes longer for a search engine to respond to the query the first time (the first page
return), the statistics for the first returned page is collected separated from the rest pages. We ran this set
of experiments with three of the five studied search engines, AlltheWeb, MSE, and Yahoo!. We did not
run the experiment for Google because it does not respond to programmed queries through the browser
interface. Although Google provides a nice set of APIs (Google API, 2005) to query its data collection
directly, the information provided through the API is not exactly the same as that through the browser
Figure 1. Yahoo! returning page showing 36,500,000 relevant results to the query “thread” and a pro-
cessing time of 0.10 seconds
Search WWH ::




Custom Search