Information Technology Reference
In-Depth Information
Table 6.5 Comparison of the EX-SMO algorithm training time (in hours) against SMLP and SMO
L1-L2-SVM
SVM
L1-SVM
ʻ
L2-SVM
0 . 001
0 . 005
0 . 01
0 . 05
0 . 1
0 . 5
1
Training
SMLP
EX-SMO
SMO
Time (h)
2.54
2.37
1.97
1.54
1.47
1.39
1.32
1.14
1.13
(Flannery et al. 1992 ) and L2 SVM (the conventional SMO (Keerthi et al. 2001 )).
Table 6.5 shows that EX-SMO performs comparably to SMO on L2 problems and
outstrips SMLP on training L1 SVMs, albeit EX-SMOeffectiveness tends to decrease
as
0: this is expected, as we are using a QP tool to solve an (almost) LP problem,
and this suboptimal approach leads to a slight loss in the algorithm's performance.
Table 6.6 reports the confusion matrices for L1, L1-L2 and L2 SVMs obtained on
ʻ
the
D 2 T . We do not present results for different solvers as no differences are shown
in them. Different from the expected behavior, we observe a regular classification
performance in all the methods in terms of accuracy (with variations below 1%).
However, we found an interesting result and it is that when
05 the highest
accuracy (96.91%) is achieved instead of MC-L2-SVM and MC-L1-SVM. This
finding is possibly linked to the fact that this intermediate solution is selecting relevant
features and filtering noisy ones, two aspects that cannot be properly dealt with
extreme cases such as when
ʻ =
0
.
1 respectively.
Moreover, in Table 6.7 we collect the experiment results for all the values of
ʻ
0 and when
ʻ =
ʻ
.
They include classification accuracy, dimensionality reduction (overall
ˁ
and average
ˁ
), and grouping ability
˃
. First, we can observe that
ˁ
decreases (or increases)
with
. This corroborates the dimensionality reduction capability of L1-SVM and
equivalentlyL1-L2-SVMwith small values of
ʻ
. However, though the dimensionality
reduction capability is maximized for L1-SVM, feature grouping effects, namely the
ability of the algorithm in selecting (or neglecting) clusters of highly cross-correlated
inputs, are usually absent when
ʻ
0, although they are desirable in order to have
more insights on the informative content of each input (Segal et al. 2003 ). In order
to evaluate whether L1-L2-SVM is able to overcome these L1-related issues, as
expected from literature, we computed the correlation matrix M C
ʻ
d × d of X
and we created feature clusters by joining the 10 most cross-correlated inputs. Our
purpose was to verify the percentage
∈ R
of clusters features selected (or neglected)
by the different procedures (ranging from L1-SVM to L2-SVM): a high value for
˃
˃
is obviously desirable. Results are also shown in the table and it is thus worth
noting that: a very small subset of features (L1-SVM) is necessary to guarantee
an acceptable classification performance, though grouping effects are limited. By
balancing the effects of L1 and L2 regularization terms, we can decide whether we
want a higher accuracy, a smaller number of features or a higher grouping ability.
In the particular case of HAR using smartphones, as we are targeting the mini-
mization of the computational burden to maximize battery duration and we are only
partially interested in having insights on information content of each input, L1-L2
 
Search WWH ::




Custom Search