Search Engine Performance Comparisons - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

experimentally, which is the total number of retrieved relevant documents divided by the total number

of retrieved documents, thus the quality indicator Q can be computed.

Except the basic precision and recall measures, the rest of the afore-mentioned measures are single-

value measures. They have the advantage of representing the system performance in a single value, thus

it is easier to understand and compare the performance of different systems. However these single-value

measures share weakness in one of the two areas. Either they do not consider explicitly the positions of

the relevant documents, or they do not explicitly consider the count of relevant documents. This makes

the measures non-intuitive and difficult for users of interactive IR systems such as Web search engines

to capture the meanings of the measures.

To alleviate the problems using other single-value measures for Web search, Meng & Chen proposed

a single-value measure called RankPower (Meng & Chen 2004) that combines the precision and the

placements of the returned relevant documents. The measure is based on the concept of average ranks

and the count of returned relevant documents. A closed-form expression of the optimal RankPower can

be found such that comparisons of different Web information retrieval systems can be easily made. The

RankPower measure reaches its optimal value when all returned documents are relevant.

RankPower is defined as follows:

∑

R N

( )

avg

(7)

RankPower N

( )

where N is the total number of documents retrieved, n is the number of relevant documents among N ,

S i is the place (or the position) of the i ith relevant document.

While the physical meaning of RankPower as defined above is clear -- average rank divided by the

count of relevant documents, the domain in which its values can reach is difficult to interpret. The opti-

mal value (the minimum) is 0.5 when all returned documents are relevant. It is not clear how to interpret

this value in an intuitive way, i.e. why 0.5. The other issue is that RankPower is not bounded above. A

single relevant document listed as the last in a list of m documents assures a RankPower value of m . If

the list size increases, this value increases. In their recent work, (Tang et.al. 2007) proposed a revised

RankPower measure defined as follows:

n n

(

n n

(

(8)

RankPower N

( )

∑

where N is the total number of documents retrieved, n is the number of relevant documents among the

retrieved ones, and S i is the rank of each of the retrieved, relevant document. The beauty of this revi-

sion is that it now constrains the values of the RankPower to be between 0 and 1 with 1 being the most

favorite and 0 being the least favorite. A minor drawback of this definition is that it loses the intuition

of the original definition that is the average rank divided by the count of relevant documents.

The experiment and data analysis reported in (Meng 2006) compared RankPower measure with a

number of other measures. While the exact numerical results may not be much relevant any more be-

cause they are dated, the data do show the effectiveness of RankPower measure. The results show that

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Search WWH ::

Custom Search

Home