Semi-Supervised Support Vector Machines - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

to measure the confidence is the entropy. For a Bernoulli random variable with probability p , the

entropy is defined as

H(p) =− p log p − ( 1

− p) log ( 1

− p).

(6.28)

The entropy H reaches its minimum 0 when p =

1, i.e., when the outcome is most certain;

H reaches its maximum 1 when p = 0 . 5, i.e., most uncertain. Given a unlabeled training sample

{

0 or p =

l + u

x j }

j = l + 1 , the entropy regularizer for logistic regression is defined as

l + u

H( 1 / 1

f( x j )) ).

(f)

H(p(y

x j , w ,b))

exp (

−

(6.29)

j = l +

The entropy regularizer is small if the classification on the unlabeled instances is certain. Figure 6.4(b)

shows the entropy regularizer on a single unlabeled instance x as a function of f( x ) . Note its similarity

to the hat loss in Figure 6.3(b). In direct analogy to S3VMs, we can define semi-supervised logistic

regression by incorporating this entropy regularizer:

l + u

H( 1 / 1

f( x j )) ).

min

w ,b

log ( 1

exp (

−

y i f( x i )))

λ 1

λ 2

exp (

−

(6.30)

i =

j = l +

6.4 THE ASSUMPTION OF S3VMS AND ENTROPY REGU-

LARIZATION

Remark 6.1. The assumption of both S3VMs and entropy regularization is that the classes are

well-separated, such that the decision boundary falls into a low density region in the feature space,

and does not cut through dense unlabeled data.

If this assumption does not hold, these algorithms may be led astray. We now describe an

example scenario where S3VMs may lead to particularly poor performance.

Example 6.2. S3VMs when theModel AssumptionDoes Not Hold Consider the data shown in

Figure 6.5. The underlying data distribution p(x) is uniform in a circle of radius 0.5, except for a gap

of width 0.2 along the diagonal y =− x where the density is 0. The true class boundary is along the

anti-diagonal y

x , though. Clearly, the classes are not well-separated, and the low density region

does not correspond to the true decision boundary. This poses two problems. First, consider the case

in which the labeled instances appear on the same side of the low density region (Figure 6.5(a)). An

S3VM's search for a gap between the two classes may stuck in one of many possible local minima.

The resulting decision boundary may be worse than the decision boundary of an SVM that does

not try to exploit unlabeled data at all. Second and more severely, if the labeled instances appear on

opposite sides of the gap (Figure 6.5(b)), the S3VM will be attracted to this region and produce a

very poor classifier that gets half of its predictions incorrect.

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home