Information Technology Reference
In-Depth Information
but rather such a sample size so that complexity selection via SRM gives similar
results to complexity selection via cross-validation,
- we do not explicitly introduce the notion of error stability for the learning algo-
rithm, but this kind of stability is implicitly derived be means of Chernoff-Hoeffding-
like inequalities we write.
- we do not focus on the leave-one-out cross-validation; we consider a more general
n-fold non-stratified cross-validation
(also: more convenient for our purposes); the
leave-one-out case can be read out from our results as a special case.
1.1
Notation Related to Statistical Learning Theory
We keep the notation similar to Vapnik's [2,1].
-
We denote the finite set of samples as:
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,...,
(
x
I
,
y
I
)
,
or more shortly by encapsulating pairs as
{
z
1
,
z
2
,...,
z
I
}
,
d
are input points,
y
i
are output values corresponding to them, and
I
is the set size.
y
i
differ depending on the learning task: for
classification
(pattern-
recognition)
y
i
where
x
i
∈
R
.
-
We denote the
set of approximating functions
(models) in the sense of both classi-
fication or regression estimation as:
∈{
1
,
2
,...,
K
}
— finite discrete set, for
regression estimation y
i
∈
R
{
f
(
x
,
ω
)
}
ω
∈
Ω
,
where
Ω
is the domain of parameters of this set of functions, so a fixed
ω
can be
regarded as an index of a specific function in the set.
-
The
risk functional R
:
{
f
(
x
,
ω
)
}
ω
∈
Ω
→
R
∪{
+
∞
}
is defined as
)=
L
f
(
x
,
)
,
y
p
(
x
,
y
)
p
(
x
)
p
(
y
|
x
)
R
(
dy
dx
,
ω
ω
(1)
x
∈
X
y
∈
Y
where
p
(
x
)
is the distribution density of input points,
p
(
y
x
)
is the conditional
density of system/phenomenon outputs
y
given a fixed
x
.
p
(
x
,
y
)=
p
(
x
)
p
(
y
|
x
)
is
the joint distribution density for pairs
(
x
,
y
)
. In practice,
p
(
x
,
y
)
is unknown but
fixed
, and hence we assume the sample
|
to be
i.i.d.
4
L
is the so called
loss function
which measures the discrepancy between the output
y
and the model
f
. For classification,
L
is an indicator function:
{
z
1
,
z
2
,...,
z
I
}
0
,
for
y
=
f
(
x
,
L
f
(
x
,
ω
)
,
y
=
ω
)
;
(2)
1
,
for
y
=
f
(
x
,
ω
)
,
4
Independent, identically distributed.