Information Technology Reference
In-Depth Information
and precision measures do not work well is that Web search systems cannot practically identify and
retrieve all the documents that are relevant to a search query in the whole collection of documents. This
is required by the recall/precision measure. The third reason is that these recall/precision measures are
a pair of numbers. It is not easy to read and interpret quickly what the measure means for ordinary us-
ers. Researchers (see a summary in (Korfhage 1997)) have proposed many single-value measures such
as estimated search length ESL (Cooper 1968), averaged search length ASL (Losee 1998), F harmonic
mean , E-measure and others to tackle the third problem.
(Meng 2006) compares through a set of real-life Web search data the effectiveness of various single-
value measures. The use and the results of ASL , ESL , average precision, F-measure, E-measure, and the
RankPower , applied against a set of Web search results. The experiment data was collected by sending
72 randomly chosen queries to AltaVista (AltaVista, 2005) and MARS (Chen & Meng 2002, Meng &
Chen 2005).
The classic measures of user-oriented performance of an IR system are precision and recall which
can be traced back to the time frame of 1960's (Cleverdon et.al. 1966, Treu 1967). Assume a collection
of N documents, of which N r are relevant to the search query. When a query is issued, the IR system
returns a list of L results where L <= N , of which L r are relevant to the query. Precision P and recall R
are defined as follows:
L
L
P
= and
=
(1)
r
R
r
Note that 0 <= P <= 1 and 0 <= R <= 1. Essentially the precision measures the portion of the re-
trieved results that are relevant to the query and recall measures the percentage of relevant results are
retrieved out of the total number of relevant results in the document set. A typical way of measuring
precision and recall is to compute the precision at each recall level. A common method is to set the recall
level to be of 10 intervals with 11 points ranging from 0.0 to 1.0. The precision is calculated for each
of the recall level. The goal is to have a high precision rate, as well as a high recall rate. Several other
measures are related to the measure of precision and recall. Average precision and recall (Korfhage
1997) computes the average of recall and precision over a set of queries. The average precision at seen
relevant documents (Baeza-Yates 1999) takes the average of precision values after each new relevant
document is observed. The R-precision (Baeza-Yates 1999) measure assumes the knowledge of total
number of relevant documents R in the document collection. It computes the precision at R -th retrieved
documents. The E measure
2
1
+
E
= −
1
2
1
(2)
+
R P
was proposed in (Van Rijsbergen 1974) which can vary the weight of precision and recall by adjusting
the parameter β between 0 and 1. In the extreme cases when β is 0, E = 1 - P , where recall has the least
effect, and when β is 1,
 
Search WWH ::




Custom Search