Information Technology Reference
In-Depth Information
G
Definition 17.1 For a function class
, the empirical RA is defined as
n
1
n
R
(
G
)
=
E σ sup
g
σ i g(z i ),
(17.9)
G
i
=
1
where z i ,i
=
1 ,...,n are i.i.d. random variables, and σ i ,i
=
1 ,...,n are i.i.d. ran-
1
2
dom variables, with probability
of taking a value of
+
1or
1.
Based on the RA theory [ 4 , 5 ], the following generalization bounds have been
derived. Here it is assumed that
x
X
,
x
M , and the scoring function f is
F ={ x w T x : w B. }
learned from the linear function class
, for simplicity.
In this case, one has
x X , | f(x) |≤ BM .
Theorem 17.5 Let
x y ) be the
corresponding listwise loss . Given the training data S ={ ( x (i) (i y ), i =
A
denote ListNet or ListMLE , and let L
(f
;
A
1 ,...,n }
,
m
f F ,( x y ) X
× Y ,L A (f ;
x y ) ∈[
0 , 1
]
, with probability at least 1
δ
(0 <δ< 1), the following inequality holds :
2log δ
n
R φ (f ) R φ (f )
2 C A (ϕ)N(ϕ) R ( F ) +
sup
f
,
(17.10)
F
R
where
(
F
) is the RA of the scoring function class ( for the linear scoring function ,
R
ϕ (x) measures the smoothness
of the transformation function ϕ ; C A (ϕ) is an algorithm-dependent factor .
2 BM
n
we have
(
F
)
); N(ϕ)
=
sup x ∈[− BM,BM ]
The expressions of N(ϕ) and C A (ϕ) for ListNet and ListMLE, with respect to
three representative transformation functions are listed in Table 17.1 . 1
From Theorem 17.5 , one can see that when the number of training queries n
approaches infinity, the query-level generalization bound will converge to zero at a
rate of O( 1
n ) . Furthermore, by comparing the query-level generalization bound for
different listwise ranking algorithms, and with regards to different transformation
functions, one can have the following observations.
The query-level generalization bound for ListMLE is much tighter than that for
ListNet, especially when m , the length of the list, is large.
The query-level generalization bound for ListMLE decreases monotonously,
while that of ListNet increases monotonously, with respect to m .
The linear transformation function is the best choice in terms of the query-level
generalization bound in most cases.
1 The three transformation functions are
Linear Functions: ϕ L (x) = ax + b, x ∈[− BM, BM ]
.
Exponential Functions: ϕ E (x) = e ax ,x ∈[− BM, BM ]
.
1
1 + e ax ,x ∈[−
Sigmoid Functions: ϕ S (x) =
BM, BM
]
.
Search WWH ::




Custom Search