Theory andOutlook - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

CHAPTER

8

Theory and Outlook

We have discussed many different semi-supervised learning algorithms throughout the topic. One

might wonder: is there any theoretic guarantee that these algorithms “work”?

In this chapter, we introduce a simple computational theory to justify semi-supervised learning.

Our discussion is based on the notion of compatibility proposed by Balcan and Blum [ 10 ], and

the Probably Approximately Correct (PAC) learning framework of Valiant [ 174 ]. The particular

theorems in this chapter may not be empirically useful, because they make several strong assumptions

that are unlikely to be met in practice. Nonetheless, the way the problem is posed and the proof

technique will be useful in understanding other, more advanced versions of the learning theory in

the references. In what follows, we will first introduce a simple PAC learning bound for supervised

learning, then extend it to semi-supervised learning. For simplicity, we shall restrict ourselves to

binary classification tasks.

8.1 A SIMPLE PAC BOUND FOR SUPERVISEDLEARNING ∗

Recall the supervised learning problem. Let the domain of instances be

X

, and the domain of

Y ={− 1 , 1 }

labels be

.Let P( x ,y) be an unknown joint probability distribution on instances and

i . i . d .

∼

i = 1

labels

X × Y

. Given a training sample

D

consisting of labeled data

D ={

( x i ,y i )

}

P( x ,y) ,

F

, supervised learning learns a function f D ∈ F

,f : X → Y

and a function family

. We hope f D

minimizes the true error e(f ) , which for any function f is defined as:

e(f ) = E ( x ,y) ∼ P [ f( x ) = y ] .

(8.1)

But since P is unknown, we cannot directly verify that. Instead, we can observe only the training

sample error

e(f ) , which is defined in Definition 1.11:

l

1

l

e(f ) =

(f ( x i ) = y i ).

(8.2)

i = 1

For simplicity, we will let the supervised learning algorithm pick the function f D ∈ F

that minimizes

the training sample error. In addition, let us assume that the training sample error is zero:

0.

Can we say anything about its true error e(f D ) ? As mentioned before, the precise value of e(f D ) is

impossible to compute because we do not know P . However, it turns out that we can bound e(f D )

without the knowledge of P .

First, notice that f D is a random variable that depends on the training sample

e(f D )

ˆ

=

D

. Next consider

the event

{ e(f D )> }

, i.e., the particular zero-training-error function f D chosen by the learning

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home