Geoscience Reference
In-Depth Information
60 CHAPTER 6. SEMI-SUPERVISED SUPPORTVECTORMACHINES
In the equation above, i = 1 ξ i is the total amount of relaxation, and we would like to minimize a
weighted sum of it and
2 . The weight λ balances the two objectives. This formulation thus still
attempts to find the maximum margin separation, but allows some training instances to be on the
wrong side of the decision boundary. It is still a quadratic program. The optimization problem (6.9)
is known as the primal form of a linear SVM.
It is illustrative to cast (6.9) into a regularized risk minimization framework, as this is how
we will extend it to S3VMs. Consider the following optimization problem:
w
min
ξ
ξ
subject to
ξ z
ξ
0 .
(6.10)
It is easy to verify that when z 0, the objective is 0; when z> 0, the objective is z . Therefore,
solving problem (6.10) is equivalent to evaluating the function
max (z, 0 ).
(6.11)
y i ( w x i +
Noting in (6.9) the inequality constraints on ξ i can be written as ξ i
1
b) , we set
y i ( w x i + b) to turn (6.9) into the sum of the form (6.10). This in turn converts (6.9) into
the following equivalent, but unconstrained, regularized risk minimization problem
z i =
1
l
max ( 1 y i ( w x i + b), 0 ) + λ
2 ,
min
w ,b
w
(6.12)
i = 1
where the first term corresponds to the loss function
c( x ,y,f( x )) = max ( 1 y( w x
+ b), 0 ),
(6.13)
and the second term corresponds to the regularizer
2 .
(f) =
w
(6.14)
The particular loss function (6.13) is known as the hinge loss . We plot hinge loss as a function of
yf ( x ) = y( w x
+ b) in Figure 6.3(a). Recall that for well-separated training instances, we have
yf ( x )
1. Therefore, the hinge loss penalizes instances which are on the correct side of the decision
boundary, but within the margin (0 yf ( x )< 1); it penalizes instances even more if they are on
the wrong side of the decision boundary ( yf ( x )< 0). The shape of the loss function resembles a
hinge, hence the name.
We will not discuss the dual form of SVMs, nor the kernel trick that essentially maps the
feature to a higher dimensional space to handle non-linear problems. These are crucial to the suc-
cess of SVMs, but are not necessary to introduce S3VMs. However, we shall point out that it is
straightforward to apply the kernel trick to S3VMs, too.
Search WWH ::




Custom Search