Information Technology Reference
In-Depth Information
order to solve our multiclass HAR problem. In a similar way, we present in Sect.
6.3
the combined algorithm (MultiClass L1-L2-SVM (MC-L1-L2-SVM)) that allows
to merge the effectiveness of L2 models and the feature selection characteristics
of L1 solutions for HAR. Moreover, we describe our proposed training algorithm
(Extended SMO (EX-SMO)). Experimental results regarding the proposed SVM
approaches, addition of gyroscopes and feature selection mechanisms are presented
in Sect.
6.4
. Finally we summarize the chapter in Sect.
6.5
.
6.2 L1-Norm and L2-Norm SVMs for Activity Recognition
Our target is to design a model, which can be effectively run on smartphones with
limited battery life and computational restrictions. We have thus to identify the sim-
plest possible classifier exploiting the smallest set of features that guarantees the best
performance/computational burden ratio. For these purposes, we peruse the exploita-
tion of linear models, which use only those selected inputs that are crucial to attain
sufficient classification accuracy. In this section we formulate the OVA SVMs with
the L1- and L2-Norms.
In the framework of supervised learning and in the case of binary classification
problems, the goal is to approximate the relationship between examples from a set
X
d
elements and a set
composed of
x
i
1.
This relationship is encapsulated by a fixed, but unknown, probability distribution
P
∈ R
Y
which contains outputs targets
y
i
=±
= {
(
x
1
,
y
1
),...,(
x
n
,
y
n
)
}
P
. A training set
D
n
is sampled according to
.The
learning algorithm maps
D
n
to
f
∈ F
with a linear separator in the original space
T
x
f
(
x
)
=
w
+
b
. Moreover, the accuracy in representing the hidden relationship
P
is measured with reference to a loss function
(
f
(
x
),
y
)
.
)
=
1
))
/
2 seems the
most natural choice, as it counts the number of misclassifications, but unfortunately it
is non-convex. For this reason the hinge loss function
In general, the hard loss function
H
(
f
(
x
),
y
−
y
sign
(
f
(
x
ʾ
(
f
(
x
),
y
)
=
[1
−
yf
(
x
)
]
+
is exploited instead (Vapnik
1998
). It is possible to introduce a regularization term
in order to a
djust the
size of the class. In this case, we choose the Euclidean norm
(
j
=
1
w
2
w
2
=
j
), also known as the L2-Norm (Tikhonov and Arsenin
1978
).
the primal formulation using the L2-Norm in the minimization problem:
1
2
w
2
C
1
n
ʾ
,
min
w
,
2
+
s.t.
Y
(
X
w
+
b
n
)
≥
1
n
−
ʾ
,
ʾ
≥
0
n
,
(6.1)
b
,
ʾ
x
n
]
T
,
y
y
n
]
T
,
Y
where
ʾ
i
=
ʾ
(
f
(
x
i
),
y
i
)
,
X
=
[
x
1
|
...
|
=
[
y
1
|
...
|
=
diag
(
y
)
(
Y
is a diagonal matrix where the element on the diagonal are the
y
i
∈{
1
,...,
n
}
).
Also by introducing
n
Lagrange multipliers
ʱ
we can obtain the dual formulation
Search WWH ::
Custom Search