Search Engine Performance Comparisons - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

interface as a common Web user would see. Thus the authors decided not to use Google as a comparison.

The data from Vivisimo was also not collected because of Vivisimo's relatively small data sets.

Also collected in this set of experiments are the total number of relevant pages that each search

engine claims to have for a given query, and the processing time that the search engine takes to service

the query. The processing time is typically listed on each page, that is, search engines process and return

each page separately. Figure 1 illustrates this point, showing that there are a total of about 36,500,000

pages related to the query “thread”, and it took Yahoo! 0.1 seconds to process the first page. Other search

engines including MSE have the similar features.

r esul ts and anal ysis

In this section, we present the results from the experiments and some observations about the results.

The first set of results reported here is the search quality. This is measured by the average number of

relevant URLs among the first 20 returned URLs, the average rank, and the RankPower . Notice that the

RankPower measure has a theoretical lower bound of 0.5. The closer to that value, the better the search

quality. Table 3 shows the results from the five search engines we tested.

From the table one can tell that Google has the most favored RankPower measure because it contains

the highest average number of relevant URLs (13.52) in the results AND these relevant URLs are placed

relatively high on the returned list (average 10.33). On the other hand, MSE doesn't seem to be doing

well in the measure of RankPower . However, Microsoft's new search engine seemed to have included

a very diverse array of results for the queries that we sent to it, while Google's results seemed to be

more focused. For example, when the “basketball” query was given to Microsoft, the results included

scouting/recruitment and high school basketball. Google focused on the more popular NBA and col-

legiate levels of basketball. This seems fairly self-evident: Google became the search leader because of

its high rate of return for more popular results based on its PageRank algorithm (Brin & Page, 1998).

MSE seems to return more diverged results with “high novelty”. This observation is supported by the

results from a number of queries. If the number of relevant URLs does not reveal intuitively the sig-

nificance, the percentage of pages that are relevant among the total number of returned pages gives us

more information. The average ranks from different search engines don't differ greatly, ranging from

10.32 to 10.56. Thus a measurement of their “deviation” becomes important. The RankPower measure

captures some sense of the deviation of a set of values. The RankPower value of Google for example

Table 3. Average number of relevant URLs, average rank, and rankpower for the 27 queries measured

from the first 20 return results

Search Engine

AlltheWeb

Google

MSE

Vivisimo

Yahoo

Avg. No. Relevant URLs

13.33

13.52

10.81

13.15

12.19

Pcnt. Of Relevant URLs

67%

68%

54%

66%

61%

Avg. Rank

10.56

10.33

10.32

10.39

RankPower

0.79

0.76

0.95

0.78

0.85

Revised RankPower

0.68

0.70

0.57

0.69

0.63

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Search WWH ::

Custom Search

Home