Information Technology Reference
In-Depth Information
16.1 Document Ranking Framework
In the document ranking framework [ 1 - 3 , 5 , 7 , 10 ], only documents (together with
their labels) are considered as i.i.d. instances, and there is no similar consideration
on queries. With such a kind of assumption, for different approaches to learning to
rank, different kinds of risks can be defined.
16.1.1 The Pointwise Approach
For the pointwise approach, suppose that (x j ,y j ) are i.i.d. random variables accord-
ing to distribution P , where x
stands for
the ground-truth label of the document. Given the scoring function f , a loss occurs
if the prediction given by f is not in accordance with the given label. Here we use
L(f
X
stands for the document and y
Y
x j ,y j ) as a general representation of loss functions. It can be the pointwise 0-
1 loss, or the surrogate loss functions used by various pointwise ranking algorithms.
Given the loss function, the expected risk is defined as
;
=
R(f )
L(f
;
x j ,y j )P(dx j ,dy j ).
(16.1)
X × Y
Intuitively, the expected risk means the loss that a ranking model f would make
for a random document. Since it is almost impossible to compute the expected risk,
in practice, the empirical risk on the training set is used as an estimate of the ex-
pected risk. In particular, given the training data
m
j
{
(x j ,y j )
}
1 ,the empirical risk
=
can be defined as follows:
m
1
m
R(f ) =
L(f ; x j ,y j ).
(16.2)
j
=
1
16.1.2 The Pairwise Approach
In the pairwise approach, document pairs are learning instances. There are two
views on this approach in the document ranking framework, the first one which
we call the U-statistics View assumes that documents are i.i.d. random variables,
while the second one which we call the Average View assumes that document pairs
are i.i.d. random variables. Both views are valid in certain conditions. For exam-
ple, when the relevance degree of each document is used as the ground truth, the
U-statistics view is more reasonable. However, if the pairwise preferences between
documents are given as the ground-truth label, it might be more reasonable to take
the average view.
Search WWH ::




Custom Search