Information Technology Reference
In-Depth Information
16.2.3 The Listwise Approach
For the listwise approach, let
be the output space, whose elements are permuta-
tions of m documents, denoted as π y . Then ( x y ) can be regarded as a random
variable sampled from the space
Y
m
X
× Y
according to an unknown probability dis-
tribution P .
Suppose the listwise loss function is L(f
;
x y ) . Then the expected risk can be
represented as
R(f )
=
L(f
;
x y )P (d x ,dπ y ).
(16.13)
m
X
× Y
Intuitively, the expected risk means the loss that a ranking model f would make
for all the m documents associated with a random query q . Since it is almost im-
possible to compute the expected risk, in practice, the empirical risk on the training
set is used as an estimate of the expected risk. In particular, given the training data
{
( x (i) (i y )
n
}
i = 1 ,the empirical risk can be defined as follows:
n
L f ;
x (i) (i y .
1
n
R(f ) =
(16.14)
i
=
1
16.3 Two-Layer Ranking Framework
As pointed out in [ 9 ] and [ 4 ], the aforementioned two frameworks have their limi-
tations in analyzing the ranking algorithms for information retrieval. The document
ranking framework ignores the existence of queries, while the subset ranking frame-
work ignores the sampling of documents. In contract, it is possible to sample both
more queries and more documents to label in real information retrieval scenarios.
Therefore, one should consider both queries and documents as random variables
and investigate the theoretical properties when the numbers of both queries and
documents approach infinity. This is exactly the motivation of the two-layer ranking
framework.
16.3.1 The Pointwise Approach
Let
be the query space. Each query q is assumed to be a random variable sam-
pled from the query space with an unknown probability distribution P Q
Q
. Given this
query, (x j ,y j ) is assumed to be a random variable sampled according to probability
distribution
D q (which is dependent on query q ).
Search WWH ::




Custom Search