Information Technology Reference
In-Depth Information
techniques previously been proposed in the literature (Shawe-Taylor and Sun 2011 ).
For example, the well-known and widely used SMO (Keerthi et al. 2001 ; Platt 1998 )
which consists in iteratively updating the two
ʱ i ∈{ 1 ,..., n }
that mostly violate the KKT
conditions until convergence is reached.
The L2-SVMmethod described above does not performany dimensionality reduc-
tion. This is, however, not desirable in some practical applications where it is needed
to highlight relevant features, discard noisy ones, and reduce the computational bur-
den of performing the classification of new samples. For this issue, the replacement
of the L2-Norm termwith the Manhattan distance or L1-Norm (
w 1 = j = 1 w j )
has been proposed (Tibshirani 1996 ):
C 1 n ʾ ,
min
w ,
w 1 +
s.t. Y
(
X
w +
b n )
1 n ʾ , ʾ
0 n ,
(6.2)
b
, ʾ
This L1 regularization introduces an automatic feature selection effect which
is embedded in the learning process. The optimization problem using L1-Norm is
formulated in the following way:
1 d w + + w +
C 1 n ʾ
s.t. Y X w + w +
min
w + , w ,
b
, ʾ
(6.3)
b n
0 n , w + , w
1 n ʾ , ʾ
0 d ,
w j ) to become zero
which is useful for our needs as the calculations required to perform the FFP are
directly related to the number of weights different from zero. The disadvantage of
this approach is that effective tools developed for the conventional L2-SVM cannot
be applied for solving it.
Note also that the conventional kernel trick cannot be exploited in the previous
formulation, thus the effectiveness of linear models assumes an ever greater impor-
tance. This problem is a standard Linear Programming problem, for which many
tools have been developed throughout the years (Press et al. 2007 ).
L1-SVM allows to perform dimensionality reduction thanks to the characteristics
of the Manhattan norm exploited, that is several weights
The solution to this problem provokes many of the weights (
w j will be (generally)
nullified during the learning procedure: this is in contrast with the conventional L2-
SVM, where
d in the final model.
The extension of the L1-SVM binary problem into multiple classes for our appli-
cation can be achieved using a OVA approach (MultiClass L1-SVM (MC-L1-SVM))
as we also did in Chap. 5 but with some modifications in order to take advantage of
the intrinsic feature selection mechanisms of the model. The final classification is
carried out by contemplating the output of m linear models:
w j
=
0
j
=
1
,...,
T
c
f c (
x
) = w
x
+
b c ,
c
∈ {
1
,...,
m
}
(6.4)
Each binary model is learned independently and not all the zero-valued weights
for one class, are also zero for the others. For such purposes, we can define the set
Search WWH ::




Custom Search