Semi-Supervised Support Vector Machines - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

on the correct side of the decision boundary. The signed geometric margin is the distance from the

decision boundary to the closest labeled instance:

min

i = 1 y i f( x i )/

(6.3)

If a decision boundary separates the labeled training sample, the geometric margin is positive. We

want to find the decision boundary that maximizes the geometric margin:

min

max

w ,b

i = 1 y i f( x i )/

(6.4)

This is difficult to optimize directly, so we will rewrite it into an equivalent form. First, we notice

that one can arbitrarily scale the parameters ( w ,b) → (c w , cb) in (6.4). To remove this nuisance

degree of freedom, we require that the instances closest to the decision boundary satisfy

min

1 y i f( x i ) = 1 .

(6.5)

This implies that for all labeled instances i

1 ...l , we have the constraint

y i f( x i ) = y i ( w x i + b) ≥ 1 .

(6.6)

We can then rewrite (6.4) as a constrained optimization problem:

max

w ,b

1 /

y i ( w x i + b) ≥ 1 ,i = 1 ...l.

subject to

(6.7)

2 . This leads to the following

In addition, maximizing 1 /

is equivalent to minimizing

quadratic program, which is easier to optimize:

min

w ,b

y i ( w x i + b) ≥ 1 ,i = 1 ...l.

subject to

(6.8)

So far, we have assumed that the training sample is linearly separable. This means the con-

straints in (6.8) can all be satisfied by at least some parameters w ,b . We now relax this assumption to

allow any training sample, even linearly non-separable ones. When a training sample is linearly non-

separable, at least one constraint in (6.8) cannot be satisfied by any parameters. This renders (6.8)

infeasible. We have to relax the constraints by allowing y i f( x i )< 1 on some instances. But we will

penalize the amount by which we have to make this relaxation. This is done by introducing slack

variables , i.e., the amount of relaxation for each instance, ξ i ≥

0 ,i

1 ...l into (6.8):

min

w ,b,ξ

ξ i +

y i ( w x i + b) ≥ 1 − ξ i ,i = 1 ...l

ξ i ≥

subject to

0 .

(6.9)

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home