Information Technology Reference
In-Depth Information
but rather such a sample size so that complexity selection via SRM gives similar
results to complexity selection via cross-validation,
- we do not explicitly introduce the notion of error stability for the learning algo-
rithm, but this kind of stability is implicitly derived be means of Chernoff-Hoeffding-
like inequalities we write.
- we do not focus on the leave-one-out cross-validation; we consider a more general
n-fold non-stratified cross-validation (also: more convenient for our purposes); the
leave-one-out case can be read out from our results as a special case.
1.1
Notation Related to Statistical Learning Theory
We keep the notation similar to Vapnik's [2,1].
- We denote the finite set of samples as:
( x 1 , y 1 ) , ( x 2 , y 2 ) ,..., ( x I , y I ) ,
or more shortly by encapsulating pairs as
{
z 1 , z 2 ,..., z I }
,
d are input points, y i are output values corresponding to them, and I
is the set size. y i differ depending on the learning task: for classification (pattern-
recognition) y i
where x i
R
.
- We denote the set of approximating functions (models) in the sense of both classi-
fication or regression estimation as:
∈{
1 , 2 ,..., K
}
— finite discrete set, for regression estimation y i
R
{
f ( x ,
ω
)
} ω Ω ,
where
Ω
is the domain of parameters of this set of functions, so a fixed
ω
can be
regarded as an index of a specific function in the set.
- The risk functional R :
{
f ( x ,
ω
)
} ω Ω R ∪{
+
}
is defined as
)=
L f ( x ,
) , y p ( x , y )
p ( x ) p ( y | x )
R (
dy dx ,
ω
ω
(1)
x X
y Y
where p ( x ) is the distribution density of input points, p ( y
x ) is the conditional
density of system/phenomenon outputs y given a fixed x . p ( x , y )= p ( x ) p ( y
|
x ) is
the joint distribution density for pairs ( x , y ) . In practice, p ( x , y ) is unknown but
fixed , and hence we assume the sample
|
to be i.i.d. 4 L is the so called
loss function which measures the discrepancy between the output y and the model
f . For classification, L is an indicator function:
{
z 1 , z 2 ,..., z I }
0 , for y = f ( x ,
L f ( x , ω ) , y =
ω
) ;
(2)
1 , for y
= f ( x ,
ω
) ,
4
Independent, identically distributed.
 
Search WWH ::




Custom Search