Geoscience Reference
In-Depth Information
82 CHAPTER 8. THEORY ANDOUTLOOK
Note it only involves the marginal distribution on
. For a function whose decision boundary cuts
through unlabeled data, e U (f ) will be large. Similarly, we can define a “sample unlabeled data error”
X
l + u
1
u
e U (f ) =
(f, x i ).
(8.13)
i = l + 1
Now, with an argument very similar to Theorem 8.1, if we see u unlabeled instances where
log
,
1
log 2
δ
| F |+
u
=
(8.14)
then with probability at least 1
. That is, this
amount of unlabeled training data allows us to say with confidence that, if we find a function with
ˆ
δ/ 2,all f
F
with
e U (f )
ˆ
=
0 have e U (f )
e U (f )
=
0, then it must have come from the sub-family
F
()
≡{
f
F :
e U (f )
}
.
(8.15)
F
F
Note
0
and applying Theorem 8.1 again (this time on labeled data), we can guarantee the following. After
seeing l labeled training instances where
() may be much smaller than
. Then, restricting our analysis to functions with
e U (f )
ˆ
=
log
,
1
log 2
δ
l =
| F
() |+
(8.16)
F
with probability at least 1 δ/ 2,all f
e(f ) = 0 have e(f ) . Putting the previous
two results together, we have the following theorem (Theorem 21.5 in [ 10 ]):
Theorem8.2. Simple sample complexity for semi-supervised learning. Assume
() with
F
is finite. Given
any > 0 ,δ > 0 , if we see l labeled and u unlabeled training instances where
log | F
log | F |+ log 2
δ
,
1
() |+ log 2
δ
1
l =
and u =
(8.17)
then with probability at least 1
δ, all f
F
with zero training error
e(f )
ˆ
=
0 and zero sample
unlabeled data error
e U (f ) = 0 have e(f ) .
Some discussion is in order. This theorem may require less labeled data, compared to Theo-
rem 8.1. Therefore, it justifies semi-supervised learning. However, its conditions are stringent: one
must be able to find an f where both the labeled and unlabeled training error is zero. Semi-supervised
learning algorithms, such as S3VMs, can be viewed as attempting to minimize both the labeled and
unlabeled training error at the same time.
The theorem holds for arbitrary incompatibility functions . For example, we can define an
“inverse S3VM” function which prefers to cut through dense unlabeled data:
inv (f, x ) =
1
S3VM (f, x ).
(8.18)
Search WWH ::




Custom Search