Co-Training - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

40 CHAPTER 4. CO-TRAINING

The success of regularized risk minimization depends on the regularizer (f) . Different

regularizers imply different assumptions of the task. For example, a commonly used regularizer

for f( x )

2 . This particular regularizer penalizes the squared norm of the

parameters w . It is helpful to view f as a point whose coordinates are determined by w in the

parameter space. An equivalent form for the optimization problem in (4.3) is

w x is (f)

R(f )

min

f ∈ F

subject to

(f)

≤

where s is determined by λ . It becomes clear that the regularizer constrains the radius of the ball in

the parameter space. Within the ball, the f that best fits the training data is chosen. This controls

the complexity of f , and prevents overfitting.

Importantly, for semi-supervised learning, one can often define the regularizer (f) using

the unlabeled data. For example,

(f) = SL (f ) + λ SSL (f ),

(4.4)

where SL (f ) is a supervised regularizer, and SSL (f ) is a semi-supervised regularizer that depends

on the unlabeled data. When SSL (f ) indeed fits the task, such regularization can produce a better

f ∗ than that produced by SL (f ) alone. We will next show how to define SSL (f ) to encourage

agreement among multiple hypotheses, and discuss other forms of SSL (f ) in later chapters.

We are now ready to introduce multiview learning . We assume the algorithm has access to

k separate learners. It is possible, but not necessary, for each learner to use a subset of the features

of an instance x . This is the generalization of Co-Training to k views, hence the name multiview.

Alternatively, the learners might be of different types (e.g., decision tree, neural network, etc.) but

take the same features of x as input. This is similar to the so-called ensemble method. In either

case, the goal is for the k learners to produce hypotheses f 1 ,...,f k

to minimize the following

regularized risk:

c( x i ,y i ,f v ( x i )) + λ 1 SL (f v )

(f 1 ,...,f k )

= argmin f 1 ,...,f k

v = 1

i = 1

l + u

+ λ 2

c( x i ,f u ( x i ), f v ( x i )).

(4.5)

u,v = 1

i = l + 1

The intuition is for each hypothesis to not only minimize its own empirical risk, but also agree

with all the other hypotheses. The first part of the multiview regularized risk is simply the sum

of individual (supervised) regularized risks. The second part defines a semi-supervised regularizer,

which measures the disagreement of those k hypotheses on unlabeled instances:

l + u

SSL (f 1 ,...,f k )

c( x i ,f u ( x i ), f v ( x i )).

(4.6)

u,v

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home