Information Technology Reference
In-Depth Information
While the above work uses the concept of the VC dimension for general-
ization analysis, the notion of the bipartite rank-shatter coefficient, denoted as
r(
F
,m
+
,m
−
)
,isusedin[
2
]. This new notion has a similar meaning to the shatter-
ing coefficient in classification [
16
,
17
].
Using the bipartite rank-shatter coefficient as a tool, the following theorem has
been proven in [
2
]. For the class of linear scoring functions in the one-dimensional
feature space,
r(
,m
+
,m
−
)
is a constant, regardless of the values of
m
+
F
and
m
−
. In this case, the bound converges to zero at a rate of
O(
max
1
√
m
+
1
{
√
m
−
}
)
,
and is therefore tighter than the bound based on the VC dimension (as given in
Theorem
17.1
). For the class of linear scoring functions in the
d
-dimensional fea-
ture space (
d>
1),
r(
,
,m
+
,m
−
)
is of the order
O((m
+
m
−
)
d
)
, and in this case
the bound
has a s
im
ilar con
vergence rate to that based on the VC dimension, i.e.,
O(
max
F
log
(m
+
)
m
+
,
log
(m
−
)
m
−
{
}
)
.
F
X
Theorem 17.2
Let
be the class of real-valued functions on
,
then with prob-
ability at least
1
−
δ
(0
<δ<
1),
8
(m
+
+
m
−
)(
log
δ
+
R
0
(f )
−
R
0
(f )
≤
,
2
m
+
,
2
m
−
))
log
r(
F
∀
f
∈
F
,
.
m
+
m
−
(17.2)
Note that the above two results are only applicable to the case of bipartite ranking.
Some other researchers have extended these results to more general cases, e.g., the
case of
k
-partite ranking [
9
,
15
].
Here we take [
9
] as an example. In this work, the labels of documents are no
longer assumed to be positive or negative. Instead, a document is preferred to an-
other one if and only if the label of the former document is larger than that of the
latter document. Specifically, given two documents
x
u
and
x
v
whose labels are
y
u
and
y
v
, the pairwise 0-1 loss is defined as
I
{
(y
u
−
y
v
)(f (x
u
)
−
f(x
v
))<
0
}
. In other words,
if the ranking result given by function
f
is in the same order of that given by the
ground-truth label, the loss is zero; otherwise the loss is one. Based on the pairwise
0-1 loss, the expected risks and empirical risks are defined as below.
R
0
(f )
=
E
[
I
}
]
,
(17.3)
{
(y
u
−
y
v
)(f (x
u
)
−
f(x
v
))<
0
m
m
2
m(m
R
0
(f )
=
I
.
(17.4)
{
(y
u
−
y
v
)(f (x
u
)
−
f(x
v
))<
0
}
−
1
)
u
=
1
v
=
u
+
1
Using the properties of U-statistics [
9
], the following generalization bound is
obtained with respect to the aforementioned empirical and expected risks.