Semi-Supervised Support Vector Machines - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

60 CHAPTER 6. SEMI-SUPERVISED SUPPORTVECTORMACHINES

In the equation above, i = 1 ξ i is the total amount of relaxation, and we would like to minimize a

weighted sum of it and

2 . The weight λ balances the two objectives. This formulation thus still

attempts to find the maximum margin separation, but allows some training instances to be on the

wrong side of the decision boundary. It is still a quadratic program. The optimization problem (6.9)

is known as the primal form of a linear SVM.

It is illustrative to cast (6.9) into a regularized risk minimization framework, as this is how

we will extend it to S3VMs. Consider the following optimization problem:

w

min

ξ

subject to

ξ ≥ z

ξ

≥

0 .

(6.10)

It is easy to verify that when z ≤ 0, the objective is 0; when z> 0, the objective is z . Therefore,

solving problem (6.10) is equivalent to evaluating the function

max (z, 0 ).

(6.11)

y i ( w x i +

Noting in (6.9) the inequality constraints on ξ i can be written as ξ i ≥

1

−

b) , we set

− y i ( w x i + b) to turn (6.9) into the sum of the form (6.10). This in turn converts (6.9) into

the following equivalent, but unconstrained, regularized risk minimization problem

z i =

1

l

max ( 1 − y i ( w x i + b), 0 ) + λ

2 ,

min

w ,b

w

(6.12)

i = 1

where the first term corresponds to the loss function

c( x ,y,f( x )) = max ( 1 − y( w x

+ b), 0 ),

(6.13)

and the second term corresponds to the regularizer

2 .

(f) =

w

(6.14)

The particular loss function (6.13) is known as the hinge loss . We plot hinge loss as a function of

yf ( x ) = y( w x

+ b) in Figure 6.3(a). Recall that for well-separated training instances, we have

yf ( x )

1. Therefore, the hinge loss penalizes instances which are on the correct side of the decision

boundary, but within the margin (0 ≤ yf ( x )< 1); it penalizes instances even more if they are on

the wrong side of the decision boundary ( yf ( x )< 0). The shape of the loss function resembles a

hinge, hence the name.

We will not discuss the dual form of SVMs, nor the kernel trick that essentially maps the

feature to a higher dimensional space to handle non-linear problems. These are crucial to the suc-

cess of SVMs, but are not necessary to introduce S3VMs. However, we shall point out that it is

straightforward to apply the kernel trick to S3VMs, too.

≥

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home