Information Technology Reference
In-Depth Information
•
If we only have a limited budget to label
C
documents in total, according to Theo-
rem
17.6
, there is an optimal trade off between the number of training queries and
that of training documents per query. This is consistent with previous empirical
findings in [
18
].
Actually one can attain the optimal trade off by solving the following optimiza-
tion problem:
2
M
2
log
(
δ
)
n
min
n,m
(
1
)
,...,m
(n)
D(L
◦
F
,n)
+
D
L
,
m
(i)
2
n
n
2
M
2
log
δ
m
(i)
n
2
1
n
+
+
◦
F
i
=
1
i
=
1
n
s
.
t
.
m
i
=
C.
i
=
1
This optimization problem is easy to solve. For example, if
VC(
F
)
=
V
,
for
th
e
pairwise
0-1 loss, the solution to the optimization problem is
n
∗
=
+
√
2lo
g
(
4
/δ)
c
1
√
2
V
c
1
√
V
√
C
,
m
(i)
∗
≡
C
n
∗
, where
c
1
is a constant. From this result we note
the following.
n
∗
decreases with the increasing capacity of the function class. That is, we should
label fewer queries and more documents per query when the hypothesis space is
larger.
•
For fixed hypothesis space,
n
∗
increases with the confidence level
δ
. That is, we
should label more query if we want the bound to hold with a larger probability.
•
The above findings can be used to explain the behavior of existing pairwise rank-
ing algorithms, and can be used to guide the construction of training set for learning
to rank.
17.3 Algorithm-Dependent Generalization Bound
Note that the uniform generalization bound is valid for any function in the hypoth-
esis space under investigation, no matter whether the function is the optimal one
learned by a given algorithm or not. Therefore, it is understandable that the bound
might not be very tight. In contrast, it is possible that we can get a tighter bound for
the optimal function learned by the algorithm. This is just the motivation of devel-
oping the algorithm-dependent generalization bounds.
2
Again, in this section, we
2
Note that the disadvantage of algorithm-dependent bounds lies in that they can only be used for
specific algorithms, and may not be derived for every algorithm.