Geoscience Reference
In-Depth Information
CHAPTER
8
Theory and Outlook
We have discussed many different semi-supervised learning algorithms throughout the topic. One
might wonder: is there any theoretic guarantee that these algorithms “work”?
In this chapter, we introduce a simple computational theory to justify semi-supervised learning.
Our discussion is based on the notion of compatibility proposed by Balcan and Blum [ 10 ], and
the Probably Approximately Correct (PAC) learning framework of Valiant [ 174 ]. The particular
theorems in this chapter may not be empirically useful, because they make several strong assumptions
that are unlikely to be met in practice. Nonetheless, the way the problem is posed and the proof
technique will be useful in understanding other, more advanced versions of the learning theory in
the references. In what follows, we will first introduce a simple PAC learning bound for supervised
learning, then extend it to semi-supervised learning. For simplicity, we shall restrict ourselves to
binary classification tasks.
8.1 A SIMPLE PAC BOUND FOR SUPERVISEDLEARNING
Recall the supervised learning problem. Let the domain of instances be
X
, and the domain of
Y ={− 1 , 1 }
labels be
.Let P( x ,y) be an unknown joint probability distribution on instances and
i . i . d .
i = 1
labels
X × Y
. Given a training sample
D
consisting of labeled data
D ={
( x i ,y i )
}
P( x ,y) ,
F
, supervised learning learns a function f D F
,f : X Y
and a function family
. We hope f D
minimizes the true error e(f ) , which for any function f is defined as:
e(f ) = E ( x ,y) P [ f( x ) = y ] .
(8.1)
But since P is unknown, we cannot directly verify that. Instead, we can observe only the training
sample error
e(f ) , which is defined in Definition 1.11:
l
1
l
e(f ) =
(f ( x i ) = y i ).
(8.2)
i = 1
For simplicity, we will let the supervised learning algorithm pick the function f D F
that minimizes
the training sample error. In addition, let us assume that the training sample error is zero:
0.
Can we say anything about its true error e(f D ) ? As mentioned before, the precise value of e(f D ) is
impossible to compute because we do not know P . However, it turns out that we can bound e(f D )
without the knowledge of P .
First, notice that f D is a random variable that depends on the training sample
e(f D )
ˆ
=
D
. Next consider
the event
{ e(f D )> }
, i.e., the particular zero-training-error function f D chosen by the learning
 
Search WWH ::




Custom Search