Information Technology Reference
In-Depth Information
generalization over queries; and with the two-layer ranking framework, one is con-
cerned with the generalization over both queries and documents.
In this chapter, we will introduce existing works on generalization analysis for
ranking, with respect to different types of bounds and different ranking frameworks.
17.2 Uniform Generalization Bounds for Ranking
Several works have been done on the uniform generalization bounds of either the
true loss or the surrogate loss functions in ranking. Some of them fall into the doc-
ument ranking framework, while others fall into the subset ranking framework and
the two-layer ranking framework.
17.2.1 For Document Ranking
In [ 10 ], the uniform generalization ability for RankBoost is discussed in the context
of bipartite document ranking. That is, there are two levels of relevance degrees in
the data and there is no notion of query considered in the analysis. The general-
ization bound is defined with respect to the pairwise 0-1 loss. That is, if a pair of
documents (whose labels are positive and negative respectively) is correctly ranked,
the loss is zero, otherwise it is one. The corresponding empirical and expected risks
are defined as in (16.5) and (16.6) (see Chap. 16).
To perform the analysis, the VC dimension [ 16 , 17 ] is used. According to the
theoretical results obtained in [ 10 ] (see the foll owing the orem), the generalization
bound converges to zero at a rate of O( max
log (m + )
m +
, log (m )
m
) , where m +
{
}
and
m
are the numbers of relevant and irrelevant documents respectively.
F ,
which has a finite VC dimension V , the scoring functions f ( which are the weighted
combinations of the weak rankers ) belong to function class
Theorem 17.1 Assume that all the weak learners belong to function class
. Let S +
and S
F
be positive and negative samples of size m +
and m , respectively . That is , S + =
m +
j
m
j
{ x j }
x j }
and S + ={
1 . Then with probability at least 1
δ (0 <δ< 1), the
=
1
=
following inequality holds for
f F
:
V ( log 2 m +
V
log 1 δ
R 0 (f )
R 0 (f )
+
1 )
+
2
m +
V ( log 2 m
log 1 δ
V +
1 )
+
+
2
,
(17.1)
m
where V =
2 (V
+
1 )(T
+
1 ) log 2 (e(T
+
1 )) , T is the number of weak rankers in
RankBoost .
Search WWH ::




Custom Search