Co-Training - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

The pairwise disagreement is defined as the loss on an unlabeled instance x i when pretending f u (x i )

is the label and f v (x i ) is the prediction. Such disagreement is to be minimized. The final prediction

for input x is the label least objected to by all the hypotheses:

c( x ,y,f v ( x )).

y( x ) = argmin

y ∈ Y

(4.7)

v = 1

Different c and SL lead to different instantiations of multiview learning. We give a concrete example

below.

Example 4.6. Two-View Linear Ridge Regression Let each instance have two views x

x ( 1 ) , x ( 2 )

. Consider two linear regression functions f ( 1 ) ( x ) =

w x ( 1 ) , f ( 2 ) ( x ) =

v x ( 2 ) . Let the

[

]

f( x )) 2 . Let the supervised regularizers be SL (f ( 1 ) )

2 ,

loss function be c( x ,y,f( x ))

−

SL (f ( 2 ) ) =

2 . This particular form of regularization, i.e., penalizing the 2 norm of the pa-

rameter, is known as ridge regression. The regularized risk minimization problem is

w x ( 1 )

v x ( 2 )

) 2

min

w,v

(y i −

+ λ 1

i = 1

l + u

( w x ( 1 )

v x ( 2 )

) 2 .

+ λ 2

−

(4.8)

i = l + 1

The solution can be found by setting the gradient to zero and solving a system of linear equations.

What is the assumption behind multiview learning? In a regularized risk framework, the

assumption is encoded in the regularizer SSL (4.6) to be minimized. That is, multiple hypotheses

f 1 ,...,f k should agree with each other. However, agreement alone is not sufficient. Consider the

following counter-example: Replicate the feature k times to create k identical “views.” Also replicate

the hypotheses f 1 = ... = f k . By definition they all agree, but this does not guarantee that they are

any better than single-view learning (in fact the two are the same). The key insight is that the set of

agreeing hypotheses need to additionally be a small subset of the hypothesis space

. In contrast,

the duplicating hypotheses in the counter-example still occupy the whole hypothesis space

Remark 4.7. Multiview Learning Assumption Multiview learning is effective, when a set of

hypotheses f 1 ,...,f k agree with each other. Furthermore, there are not many such agreeing sets,

and the agreeing set happens to have a small empirical risk.

This concludes our discussion of co-training and multiview learning techniques. These models

use multiple views or classifiers, in conjunction with unlabeled data, in order to reduce the size of

the hypothesis space. We also introduced the regularized risk minimization framework for machine

learning, which will appear again in the next two chapters on graph-based methods and semi-

supervised support vector machines.

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home