Theory andOutlook - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

82 CHAPTER 8. THEORY ANDOUTLOOK

Note it only involves the marginal distribution on

. For a function whose decision boundary cuts

through unlabeled data, e U (f ) will be large. Similarly, we can define a “sample unlabeled data error”

l + u

e U (f ) =

(f, x i ).

(8.13)

i = l + 1

Now, with an argument very similar to Theorem 8.1, if we see u unlabeled instances where

log

log 2

| F |+

(8.14)

then with probability at least 1

. That is, this

amount of unlabeled training data allows us to say with confidence that, if we find a function with

−

δ/ 2,all f

∈ F

with

e U (f )

0 have e U (f )

≤

e U (f )

0, then it must have come from the sub-family

()

≡{

∈ F :

e U (f )

≤

}

(8.15)

Note

and applying Theorem 8.1 again (this time on labeled data), we can guarantee the following. After

seeing l labeled training instances where

() may be much smaller than

. Then, restricting our analysis to functions with

e U (f )

log

log 2

l =

| F

() |+

(8.16)

∈ F

with probability at least 1 − δ/ 2,all f

e(f ) = 0 have e(f ) ≤ . Putting the previous

two results together, we have the following theorem (Theorem 21.5 in [ 10 ]):

Theorem8.2. Simple sample complexity for semi-supervised learning. Assume

() with

is finite. Given

any > 0 ,δ > 0 , if we see l labeled and u unlabeled training instances where

log | F

log | F |+ log 2

() |+ log 2

l =

and u =

(8.17)

then with probability at least 1

−

δ, all f

∈ F

with zero training error

e(f )

0 and zero sample

unlabeled data error

e U (f ) = 0 have e(f ) ≤ .

Some discussion is in order. This theorem may require less labeled data, compared to Theo-

rem 8.1. Therefore, it justifies semi-supervised learning. However, its conditions are stringent: one

must be able to find an f where both the labeled and unlabeled training error is zero. Semi-supervised

learning algorithms, such as S3VMs, can be viewed as attempting to minimize both the labeled and

unlabeled training error at the same time.

The theorem holds for arbitrary incompatibility functions . For example, we can define an

“inverse S3VM” function which prefers to cut through dense unlabeled data:

inv (f, x ) =

− S3VM (f, x ).

(8.18)

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home