Information Technology Reference
In-Depth Information
will introduce previous work according to the statistical ranking frameworks that
they use.
17.3.1 For Document Ranking
In [ 1 , 3 ], the stability is used as a tool to derive the algorithm-dependent general-
ization bound for pairwise ranking algorithms, under the framework of document
ranking.
Let
A
be a ranking algorithm, and let L be the loss function that is minimized
A
in
. Suppose we have learned a ranking model f 1 from m training documents
using algorithm
. Then we replace a training document with another document
and learn a new model f 2 from the new training data. We say that
A
has uniform
loss stability β with respect to L , if the difference between the losses with respect
to f 1 and f 2 on any unseen document pair (x u ,x v ) is smaller than β(m) . In other
words, if the model learned from the training data is robust to the small change in
the training data, the algorithm is regarded as having certain stability.
The generalization bound obtained in [ 1 , 3 ] is as shown in Theorem 17.7 .
A
Theorem 17.7 Let L be the loss function with L(f ; x u ,x v ,y u,v )
∈[
0 ,B
]
, and
A
has uniform loss stability β . Then with probability at least 1
δ( 0 <δ< 1 ) , the
following inequality holds :
2ln ( δ )
m
+ mβ (m)
B
R(f S )
R(f S )
2 β(m)
+
.
(17.11)
For different algorithms, one can get different stability coefficients (note that not
all algorithms are stable, and therefore the stability coefficients of some algorithms
do not approach zero when the number of training data m approaches infinity).
In particular, symmetric ranking algorithms with general regularization terms are
stable. For example, Ranking SVM is a stable algorithm, and its stability coefficient
is
16 κ 2
λm
β(m)
=
,
(17.12)
κ 2 <
where
x
X
,K(x,x)
, λ is the tradeoff coefficient for the regulariza-
tion term in Ranking SVM.
As can be seen from the above discussions, for algorithms like Ranking SVM
[ 11 , 12 ], the bound converges to zero at a rate of O(
1
m ) . This bound is tighter than
the bound based on the Rank-Shatter Coefficient (see inequality ( 17.2 )) except for
the case of using linear scoring functions in the one-dimensional feature space. This
clearly shows the advantage of the algorithm-dependent bound.
Search WWH ::




Custom Search