Information Technology Reference
In-Depth Information
Table 17.1 N(ϕ) and
C A (ϕ) for ListNet and
ListMLE
ϕN(ϕ)
C ListMLE (ϕ)
C ListNet (ϕ)
2
2 m
!
ϕ L a
log b + aBM
b
log b + aBM
b
(b
aBM)( log m
+
aBM )
(b
aBM)( log m
+
aBM )
2 e aBM
log m
e aBM
ϕ E ae aBM
2 m
!
+
2 aBM
log m
+
2 aBM
ae aBM
e aBM )
e aBM )
2 ( 1
+
2 m
!
( 1
+
ϕ S
( 1
+ e aBM ) 2
log m
+
aBM
log m
+
aBM
17.2.3 For Two-Layer Ranking
As can be seen, the generalization bounds introduced in the previous subsections
are over either documents or queries. This is mainly because of the statistical frame-
works that they have used. As a result, they cannot explain some empirical observa-
tions. For example, the experiments in [ 18 ] show that given a fixed total number of
judgments, in order to achieve the best test performance, neither extremely deep la-
beling (i.e., labeling a large number of queries and only a few documents per query)
nor extremely shallow labeling (i.e., labeling a few queries but a large number of
documents per query) is a good choice. This is clearly not in accordance with any
of the above generalization bounds.
To address the aforementioned issues, in [ 8 ], generalization analysis for the pair-
wise ranking algorithms is conducted, under the two-layer ranking framework. The
expected and empirical risks are defined as in (16.17) and (16.18) (see Chap. 16).
In particular, the following result has been obtained in [ 8 ], also based on the
concept of Rademacher average (RA) [ 4 , 5 ].
Theorem 17.6 Suppose L is the loss function for pairwise ranking . Assume (1)
L
F
is bounded by M ,(2) E
[ R m (L
F
)
]≤
D(L
F
,m) , then with probability
at least 1
δ( 0 <δ< 1 ) ,
f
F
2 M 2 log ( δ )
n
D L
, m (i)
2
n
1
n
R(f )
R(f )
+
D(L
F
,n)
+
+
F
i =
1
n
2 M 2 log δ
m (i) n 2
+
,
i
=
1
where for certain ranking function class, D(L
F
,m) has its explicit form. For
F ={
f(x,x )
f(x )
example, for the function class
=
f(x)
;
f
F }
whos e VC
c 1 V/m
dimension is V , and
|
f(x)
|≤
B , i t has been proven that D(L 0 F
,m)
=
c 2 (B) V/m in [ 4 , 5 ], where c 1 and c 2 are both constants.
According to the above theorem, one may have the following discussions.
and D(L φ F
,m)
=
The increasing number of either queries or documents per query in the training
data will enhance the two-layer generalization ability.
and m (i)
Only if n
→∞
→∞
simultaneously does the two-layer generalization
bound uniformly converge.
Search WWH ::




Custom Search