Information Technology Reference
In-Depth Information
G
Definition 17.1
For a function class
, the empirical RA is defined as
n
1
n
R
(
G
)
=
E
σ
sup
g
σ
i
g(z
i
),
(17.9)
∈
G
i
=
1
where
z
i
,i
=
1
,...,n
are i.i.d. random variables, and
σ
i
,i
=
1
,...,n
are i.i.d. ran-
1
2
dom variables, with probability
of taking a value of
+
1or
−
1.
Based on the RA theory [
4
,
5
], the following generalization bounds have been
derived. Here it is assumed that
∀
x
∈
X
,
x
≤
M
, and the scoring function
f
is
F
={
x
→
w
T
x
:
w
≤
B.
}
learned from the linear function class
, for simplicity.
In this case, one has
∀
x
∈
X
,
|
f(x)
|≤
BM
.
Theorem 17.5
Let
x
,π
y
) be the
corresponding listwise loss
.
Given the training data S
={
(
x
(i)
,π
(i
y
), i
=
A
denote ListNet or ListMLE
,
and let L
(f
;
A
1
,...,n
}
,
m
∀
f
∈
F
,(
x
,π
y
)
∈
X
×
Y
,L
A
(f
;
x
,π
y
)
∈[
0
,
1
]
,
with probability at least
1
−
δ
(0
<δ<
1),
the following inequality holds
:
2log
δ
n
R
φ
(f )
−
R
φ
(f )
≤
2
C
A
(ϕ)N(ϕ)
R
(
F
)
+
sup
f
,
(17.10)
∈
F
R
where
(
F
) is the RA of the scoring function class
(
for the linear scoring function
,
R
ϕ
(x) measures the smoothness
of the transformation function ϕ
;
C
A
(ϕ) is an algorithm-dependent factor
.
2
BM
√
n
we have
(
F
)
≤
);
N(ϕ)
=
sup
x
∈[−
BM,BM
]
The expressions of
N(ϕ)
and
C
A
(ϕ)
for ListNet and ListMLE, with respect to
three representative transformation functions are listed in Table
17.1
.
1
From Theorem
17.5
, one can see that when the number of training queries
n
approaches infinity, the query-level generalization bound will converge to zero at a
rate of
O(
1
√
n
)
. Furthermore, by comparing the query-level generalization bound for
different listwise ranking algorithms, and with regards to different transformation
functions, one can have the following observations.
•
The query-level generalization bound for ListMLE is much tighter than that for
ListNet, especially when
m
, the length of the list, is large.
•
The query-level generalization bound for ListMLE decreases monotonously,
while that of ListNet increases monotonously, with respect to
m
.
•
The linear transformation function is the best choice in terms of the query-level
generalization bound in most cases.
1
The three transformation functions are
Linear Functions:
ϕ
L
(x)
=
ax
+
b, x
∈[−
BM, BM
]
.
Exponential Functions:
ϕ
E
(x)
=
e
ax
,x
∈[−
BM, BM
]
.
1
1
+
e
−
ax
,x
∈[−
Sigmoid Functions:
ϕ
S
(x)
=
BM, BM
]
.