Information Technology Reference

In-Depth Information

•

According to the above result, the zero values of the pairwise loss functions are

not sufficient conditions for the zero value of (1

MAP). This is a weaker con-

dition than our intuition. As we know, when the pairwise loss function takes a

zero value, every document pair is correctly classified, and correspondingly we

will get the correct ranking of the documents. In this case, we should find that

(1

−

−

MAP) equals zero.

•

It is not clear whether a similar result can also be obtained for NDCG and other

evaluation measures.

The authors of [
3
] try to overcome the aforementioned limitations. Specifically,

they show that many of the pairwise loss functions are upper bounds of a quantity,

named the essential loss for ranking. Furthermore, the essential loss is an upper

bound of (1

−

NDCG) and (1

−

MAP), and therefore these loss functions are also

upper bounds of (1

MAP).

To better illustrate this result, we first introduce the concept of the essential loss,

which is constructed by modeling ranking as a sequence of classification tasks.

Given a set of documents
x
and their ground-truth permutation
π
y
∈

−

NDCG) and (1

−

Ω
y
,the

ranking problem can be decomposed into several sequential steps. For each step
t
,

one tries to distinguish
π
−
y
(t)
, the document ranked at the
t
th position in
π
y
, from

all the documents ranked below the
t
th position in
π
y
, using a scoring function
f
.

If we denote
x
(t)
={
x
π
−
y
(t)
,...,x
π
−
y
(m)
}

, one can define a classifier based on
f
,

whose target output is
π
−
y
(t)
,

T

◦

f(
x
(t)
)

=

arg

max

j
∈{
π
−
y
(t),...,π
−
y
(m)
}

f(x
j
).

(5.5)

π
−
1

−

{

(t), . . . ,

It is clear that there are
m

t
possible outputs of this classifier, i.e.,

y

π
−
1

y

. The 0-1 loss for this classification task can be written as follows, where

the second equation is based on the definition of
T

(m)

}

◦

f
:

L
t
f
;

x
(t), π
−
y
(t)
=
I

π
−
y
(t)

{

T

◦

f(
x
(t)
)

=

}

m

=

1

−

I

.

(5.6)

{
f(π
−
y
(t))>f (π
−
y
(j))
}

j

=

t

+

1

A simple example is given in Fig.
5.1
to illustrate the aforementioned process of

decomposition. Suppose there are three objects,
A
,
B
, and
C
, and the correspond-

ing ground-truth permutation is
π
y
=
(A,B,C)
. Suppose the output of the scoring

function for these objects is
(
2
,
3
,
1
)
, and accordingly the predicted ranked list is

π

(B,A,C)
. At step one of the decomposition, the scoring function predicts ob-

ject
B
to be at the top of the list. However,
A
should be on the top according to
π
y
.

Therefore, a prediction error occurs. For step two, we remove
A
from both
π
y
and
π
.

Then the scoring function predicts object
B
to be on the top of the remaining list.

This is in accordance with
π
y
and there is no prediction error. After that, we further

remove object
B
, and it is easy to verify there is no prediction error in step three

=