Information Technology Reference
In-Depth Information
S
which includes all the indexes of the features whose weights are equal to zero for
all the classes in the following way:
m } . In that
sense, all the zero-valued weights can be removed from the FFP computation. We can
also compute the effective feature dimensionality reduction as the fraction of selected
features
S = j
| w c , j
=
,
∈ {
,...,
0
c
1
d
−|S|
d
indicates the cardinality of the set. Additionally, we
can measure the feature reduction which is possible per class. This measure give
us an idea of how fast the FFP computation can be. So we also define the mean
feature dimensionality reduction as
ˁ =
, where
|S|
m c = 1 d −| S c |
1
ˁ =
, where
S c are the indexes
d
of zero-valued features per class.
6.3 L1-L2 SVM Algorithm
The conventional L2-SVMapproach is considered as one of the state-of-the-art meth-
ods for classification, and several effective techniques have been developed through-
out the years for training these models (Fan et al. 2008 ;Ghioetal. 2012 ; Keerthi
et al. 2001 ; Platt 1998 ; Shalev-Shwartz et al. 2007 ). While allowing to derive sparse
classifiers (i.e. models described by exploiting a limited subset of training patterns),
L2-SVM (Vapnik 1998 ) does not perform any feature reduction which becomes a
limitation for the analysis of the dataset and the interpretability of the informative
content of the inputs. On the other hand, L1-SVM allows to introduce in the learning
process an automatic dimensionality reduction effect. However, despite this being
very appealing for this task, L1-SVM is also characterized by some drawbacks:
1. No feature grouping effect characterizes L1 models, i.e. clusters of highly cross-
correlated inputs are usually not entirely selected by the training procedure (Segal
et al. 2003 );
2. When the dimensionality of the dataset is remarkably larger than the number of
samples, L1 models are able to exploit only a number of inputs at most equal to
the cardinality of the training set, which could be restrictive in some applications
(Zou and Hastie 2005 );
3. L1-SVM require custom ad-hoc algorithms to be developed for classifier training
(Friedman et al. 2010 ), which do not exploit the huge effort spent in the last
decades for designing effective solvers for the conventional SVM (e.g. Keerthi
et al. 2001 ; Platt 1998 ).
In order to deal with the first two points above, an SVM which combines L1- and
L2-Norms has been proposed in (Zou and Hastie 2005 ). It allows to enhance feature
grouping effects in model training, to properly balance sparsity and dimensional-
ity reduction, and to combine the effectiveness of the L2 approach and the feature
selection characteristics of L1-SVMs.
Moreover, to cope with the third issue, we present a new training tool allowing
to efficiently deal with SVMs based on L1-, L2- and L1-L2-Norms. The proposal
builds on the efficient solvers developed in the last decades for L2-SVM (e.g. Keerthi
et al. 2001 ; Platt 1998 ), and thus can be implemented with a minimal effort.
 
 
Search WWH ::




Custom Search