Information Technology Reference
In-Depth Information
Table 17.1
N(ϕ)
and
C
A
(ϕ)
for ListNet and
ListMLE
ϕN(ϕ)
C
ListMLE
(ϕ)
C
ListNet
(ϕ)
2
2
m
!
ϕ
L
a
log
b
+
aBM
b
log
b
+
aBM
b
(b
−
aBM)(
log
m
+
aBM
)
(b
−
aBM)(
log
m
+
aBM
)
−
−
2
e
aBM
log
m
e
aBM
ϕ
E
ae
aBM
2
m
!
+
2
aBM
log
m
+
2
aBM
ae
aBM
e
aBM
)
e
aBM
)
2
(
1
+
2
m
!
(
1
+
ϕ
S
(
1
+
e
−
aBM
)
2
log
m
+
aBM
log
m
+
aBM
17.2.3 For Two-Layer Ranking
As can be seen, the generalization bounds introduced in the previous subsections
are over either documents or queries. This is mainly because of the statistical frame-
works that they have used. As a result, they cannot explain some empirical observa-
tions. For example, the experiments in [
18
] show that given a fixed total number of
judgments, in order to achieve the best test performance, neither extremely deep la-
beling (i.e., labeling a large number of queries and only a few documents per query)
nor extremely shallow labeling (i.e., labeling a few queries but a large number of
documents per query) is a good choice. This is clearly not in accordance with any
of the above generalization bounds.
To address the aforementioned issues, in [
8
], generalization analysis for the pair-
wise ranking algorithms is conducted, under the two-layer ranking framework. The
expected and empirical risks are defined as in (16.17) and (16.18) (see Chap. 16).
In particular, the following result has been obtained in [
8
], also based on the
concept of Rademacher average (RA) [
4
,
5
].
Theorem 17.6
Suppose L is the loss function for pairwise ranking
.
Assume
(1)
L
◦
F
is bounded by M
,(2)
E
[
R
m
(L
◦
F
)
]≤
D(L
◦
F
,m)
,
then with probability
at least
1
−
δ(
0
<δ<
1
)
,
∀
f
∈
F
2
M
2
log
(
δ
)
n
D
L
,
m
(i)
2
n
1
n
≤
R(f )
R(f )
+
D(L
◦
F
,n)
+
+
◦
F
i
=
1
n
2
M
2
log
δ
m
(i)
n
2
+
,
i
=
1
where for certain ranking function class,
D(L
◦
F
,m)
has its explicit form. For
F
={
f(x,x
)
f(x
)
example, for the function class
=
f(x)
−
;
f
∈
F
}
whos
e VC
c
1
√
V/m
dimension is
V
, and
|
f(x)
|≤
B
, i
t has
been proven that
D(L
0
◦
F
,m)
=
c
2
Bφ
(B)
√
V/m
in [
4
,
5
], where
c
1
and
c
2
are both constants.
According to the above theorem, one may have the following discussions.
and
D(L
φ
◦
F
,m)
=
•
The increasing number of either queries or documents per query in the training
data will enhance the two-layer generalization ability.
and
m
(i)
•
Only if
n
→∞
→∞
simultaneously does the two-layer generalization
bound uniformly converge.