Information Technology Reference

In-Depth Information

(1) The U-statistics View
In the U-statistics view, suppose that
(x
u
,y
u
)
and

(x
v
,y
v
)
are i.i.d. random variables according to distribution
P
, where
X

∈
X

stands

∈
Y

for the document and
Y

stands for the ground-truth label of the document.

(x
u
,y
u
)
and
(x
v
,y
v
)
construct a pair. Given the scoring function
f
, a loss occurs

if the documents are not ranked according to their ground truth labels. Suppose the

loss function is
L(f
;
x
u
,x
v
,y
u,v
)
, where
y
u,v
=

1. Again this loss

function can represent pairwise 0-1 loss, or pairwise surrogate loss functions used

by different algorithms. Then the
expected risk
is defined as

2

·

I
{
y
u
y
v
}
−

R(f )
=

(
X
×
Y
)
2
L(f
;
x
u
,x
v
,y
u,v
)P(dx
u
,dy
u
)P(dx
v
,dy
v
).

(16.3)

The expected risk means the loss that a ranking model
f
would make for two

random documents. Again, it is almost impossible to compute the expected risk,

and the empirical risk on the training set is used as an estimate of the expected

risk. Given the training data, the
empirical risk
can be defined with the following

U-statistics:

m

m

2

m(m

R(f )

=

L(f

;

x
u
,x
v
,y
u,v
).

(16.4)

−

1
)

u
=

1

v
=
u
+

1

Specifically, when the ground truth is given as a binary relevance degree, the

positive example is denoted as
x
+

according to
P
+

and the negative example is

m
+

j
=

denoted as
x
−

according to
P
−
. Given the training data
x
+
={

x
j
}

and
x
−
=

1

m
−

j

x
j
}

1
(where
m
+
and
m
−
are the numbers of positive and negative examples

in the training data respectively), the
expected risk
and the
empirical risk
can be

refined as follows:

{

=

x
+
,x
−
)P
+
(dx
+
)P
−
(dx
−
),

R(f )

=

L(f

;

(16.5)

2

X

m
+

m
−

1

m
+
m
−

R(f )

x
+
,x
−
).

=

L(f

;

(16.6)

u
=

1

v
=

1

In this case, we usually call the problem a bipartite ranking problem.

(2) The Average View
The average view assumes the i.i.d. distribution of docu-

ment pairs. More specifically, with the average view, each document pair
(x
u
,x
v
)

is given a ground-truth label
y
u,v
∈
Y
={−

1
,
1

}

, where
y
u,v
=

1 indicates that doc-

ument
x
u
is more relevant than
x
v
and
y
u,v
=−

1 otherwise. Then
(x
u
,x
v
,y
u,v
)
is

assumed to be a random variable with probabilistic distribution
P
, and the expected

risk can be defined as

R(f )
=

L(f
;
x
u
,x
v
,y
u,v
)P (dx
u
,dx
v
,dy
u,v
).

(16.7)

X

2

×
Y