Information Technology Reference
In-Depth Information
where b i
∈ {
0
,
1
}
is a binary valued variable and therefore
ʲ i can be expressed as
1. Since each b i belongs to a finite set,
for a fixed training set of cardinality n and a fixed kernel (with its hyperparameter),
the number of classifiers that we can represent is finite. According to the notation
of (Vapnik 1995 ) we call N f
2 k
an integer variable such that 0
ʲ i
the number of classifiers that we can build with b i
,
i
. Consequently we can exploit the well-known
Vapnik's generalization bounds for finite hypothesis sets (Vapnik 1995 ) which uses
N f
∈ {
1
,...,
n
}
and j
∈ {
0
,...,
k
1
}
as measure of complexity. Let then d ʲ be the number of nonzero parameters
(
ʲ i
=
0) then:
n
i
2 k
1 d ʲ
d ʲ
1 d ʲ
2 k 1
N f (
d ʲ )
k
,
,
(5.10)
i
=
1
where we take into account the fact that if all the parameters are even numbers, they
can be divided by two without changing the class estimate. If, instead, d b
is the
number of nonzero parameters, b i
=
0, then
nk
i
d b
N f (
d b
k
,
)
.
(5.11)
i
=
1
In the SLT and Structural RiskMinimization (SRM) frameworks (Vapnik 1995 ), a
good generalization capability on previously unseen data can be guaranteed (Anguita
et al. 2012 ; Vapnik 1995 ) if a nested structure of the available hypothesis sets with
increasing complexity is defined
. In this way, the generalization
capability of a model can be controlled by choosing the set that achieves the best
compromise between complexity and learning error.
In our case the complexity of the class can be defined through two quantities, k
and d ʲ
(H 1 H 2 ...)
(or d b ). Starting from the set
H 1 with complexity N l f (
1
,
1
)
we can increase
the complexity by increasing the number of bits k
k
+
1 or by decreasing the
sparsity of the representation d ʲ ,
d b
d ʲ ,
d b
1. In other words we have to search
the best class which is as sparse as possible (smaller d ʲ or d b ) and represented with
the minimum number of bits k .
Obviously a classifier that belongs to a space with smaller complexity is also more
energy efficient respect to the one that belongs to a space with higher complexity, as
will also be shown in the subsequent experiments.
Increasing the complexity of the space has also direct consequence on the gen-
eralization ability of the classifier since according to the bound of Vapnik (Vapnik
1995 ), which holds with probability
+
(
1
ʴ)
:
ln N f (
k
,
d
)
ln
(ʴ)
ˀ ʽ +
(5.12)
2 l
where
ˀ
is the generalization error and
ʽ
is the error obtained by the learning machine
on the dataset.
 
Search WWH ::




Custom Search