Geoscience Reference
In-Depth Information
on the correct side of the decision boundary. The signed geometric margin is the distance from the
decision boundary to the closest labeled instance:
l
min
i = 1 y i f( x i )/
w
.
(6.3)
If a decision boundary separates the labeled training sample, the geometric margin is positive. We
want to find the decision boundary that maximizes the geometric margin:
l
min
max
w ,b
i = 1 y i f( x i )/
w
.
(6.4)
This is difficult to optimize directly, so we will rewrite it into an equivalent form. First, we notice
that one can arbitrarily scale the parameters ( w ,b) (c w , cb) in (6.4). To remove this nuisance
degree of freedom, we require that the instances closest to the decision boundary satisfy
l
min
i
1 y i f( x i ) = 1 .
(6.5)
=
This implies that for all labeled instances i
1 ...l , we have the constraint
y i f( x i ) = y i ( w x i + b) 1 .
=
(6.6)
We can then rewrite (6.4) as a constrained optimization problem:
max
w ,b
1 /
w
y i ( w x i + b) 1 ,i = 1 ...l.
subject to
(6.7)
2 . This leads to the following
In addition, maximizing 1 /
w
is equivalent to minimizing
w
quadratic program, which is easier to optimize:
2
min
w ,b
w
y i ( w x i + b) 1 ,i = 1 ...l.
subject to
(6.8)
So far, we have assumed that the training sample is linearly separable. This means the con-
straints in (6.8) can all be satisfied by at least some parameters w ,b . We now relax this assumption to
allow any training sample, even linearly non-separable ones. When a training sample is linearly non-
separable, at least one constraint in (6.8) cannot be satisfied by any parameters. This renders (6.8)
infeasible. We have to relax the constraints by allowing y i f( x i )< 1 on some instances. But we will
penalize the amount by which we have to make this relaxation. This is done by introducing slack
variables , i.e., the amount of relaxation for each instance, ξ i
0 ,i
=
1 ...l into (6.8):
l
2
min
w ,b,ξ
ξ i +
λ
w
i
=
1
y i ( w x i + b) 1 ξ i ,i = 1 ...l
ξ i
subject to
0 .
(6.9)
Search WWH ::




Custom Search