Search Engine Performance Comparisons - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

and precision measures do not work well is that Web search systems cannot practically identify and

retrieve all the documents that are relevant to a search query in the whole collection of documents. This

is required by the recall/precision measure. The third reason is that these recall/precision measures are

a pair of numbers. It is not easy to read and interpret quickly what the measure means for ordinary us-

ers. Researchers (see a summary in (Korfhage 1997)) have proposed many single-value measures such

as estimated search length ESL (Cooper 1968), averaged search length ASL (Losee 1998), F harmonic

mean , E-measure and others to tackle the third problem.

(Meng 2006) compares through a set of real-life Web search data the effectiveness of various single-

value measures. The use and the results of ASL , ESL , average precision, F-measure, E-measure, and the

RankPower , applied against a set of Web search results. The experiment data was collected by sending

72 randomly chosen queries to AltaVista (AltaVista, 2005) and MARS (Chen & Meng 2002, Meng &

Chen 2005).

The classic measures of user-oriented performance of an IR system are precision and recall which

can be traced back to the time frame of 1960's (Cleverdon et.al. 1966, Treu 1967). Assume a collection

of N documents, of which N r are relevant to the search query. When a query is issued, the IR system

returns a list of L results where L <= N , of which L r are relevant to the query. Precision P and recall R

are defined as follows:

L

P

= and

=

(1)

r

R

r

Note that 0 <= P <= 1 and 0 <= R <= 1. Essentially the precision measures the portion of the re-

trieved results that are relevant to the query and recall measures the percentage of relevant results are

retrieved out of the total number of relevant results in the document set. A typical way of measuring

precision and recall is to compute the precision at each recall level. A common method is to set the recall

level to be of 10 intervals with 11 points ranging from 0.0 to 1.0. The precision is calculated for each

of the recall level. The goal is to have a high precision rate, as well as a high recall rate. Several other

measures are related to the measure of precision and recall. Average precision and recall (Korfhage

1997) computes the average of recall and precision over a set of queries. The average precision at seen

relevant documents (Baeza-Yates 1999) takes the average of precision values after each new relevant

document is observed. The R-precision (Baeza-Yates 1999) measure assumes the knowledge of total

number of relevant documents R in the document collection. It computes the precision at R -th retrieved

documents. The E measure

2

1

+

E

= −

1

2

1

(2)

+

R P

was proposed in (Van Rijsbergen 1974) which can vary the weight of precision and recall by adjusting

the parameter β between 0 and 1. In the extreme cases when β is 0, E = 1 - P , where recall has the least

effect, and when β is 1,

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Search WWH ::

Custom Search

Home