Geoscience Reference
In-Depth Information
on the correct side of the decision boundary. The signed geometric margin is the distance from the
decision boundary to the closest labeled instance:
l
min
i
=
1
y
i
f(
x
i
)/
w
.
(6.3)
If a decision boundary separates the labeled training sample, the geometric margin is positive. We
want to find the decision boundary that maximizes the geometric margin:
l
min
max
w
,b
i
=
1
y
i
f(
x
i
)/
w
.
(6.4)
This is difficult to optimize directly, so we will rewrite it into an equivalent form. First, we notice
that one can arbitrarily scale the parameters
(
w
,b)
→
(c
w
, cb)
in (6.4). To remove this nuisance
degree of freedom, we require that the instances closest to the decision boundary satisfy
l
min
i
1
y
i
f(
x
i
)
=
1
.
(6.5)
=
This implies that for all labeled instances
i
1
...l
, we have the constraint
y
i
f(
x
i
)
=
y
i
(
w
x
i
+
b)
≥
1
.
=
(6.6)
We can then rewrite (6.4) as a constrained optimization problem:
max
w
,b
1
/
w
y
i
(
w
x
i
+
b)
≥
1
,i
=
1
...l.
subject to
(6.7)
2
. This leads to the following
In addition, maximizing 1
/
w
is equivalent to minimizing
w
quadratic program, which is easier to optimize:
2
min
w
,b
w
y
i
(
w
x
i
+
b)
≥
1
,i
=
1
...l.
subject to
(6.8)
So far, we have assumed that the training sample is linearly separable. This means the con-
straints in (6.8) can all be satisfied by at least
some
parameters
w
,b
. We now relax this assumption to
allow any training sample, even linearly non-separable ones. When a training sample is linearly non-
separable, at least one constraint in (6.8) cannot be satisfied by
any
parameters. This renders (6.8)
infeasible. We have to relax the constraints by allowing
y
i
f(
x
i
)<
1 on some instances. But we will
penalize the amount by which we have to make this relaxation. This is done by introducing
slack
variables
, i.e., the amount of relaxation for each instance,
ξ
i
≥
0
,i
=
1
...l
into (6.8):
l
2
min
w
,b,ξ
ξ
i
+
λ
w
i
=
1
y
i
(
w
x
i
+
b)
≥
1
−
ξ
i
,i
=
1
...l
ξ
i
≥
subject to
0
.
(6.9)