Co-Training - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

string ( x ( 2 ) ). The two classifiers teach each other . One can formalize this process into a Co-Training

algorithm.

Algorithm 4.1. Co-Training.

Input: labeled data

i = 1 , unlabeled data

l + u

{

( x i ,y i )

}

{

x j }

j = l + 1 , a learning speed k.

x ( 1 )

, x ( 2 )

1. Initially let the training sample be L 1 = L 2 ={ ( x 1 ,y 1 ),...,( x l ,y l ) }

Each instance has two views x i =[

]

2. Repeat until unlabeled data is used up:

Train a view-1 classifier f ( 1 ) from L 1 , and a view-2 classifier f ( 2 ) from L 2 .

Classify the remaining unlabeled data with f ( 1 ) and f ( 2 ) separately.

Add f ( 1 ) 's top k most-confident predictions ( x ,f ( 1 ) ( x )) to L 2 .

Add f ( 2 ) 's top k most-confident predictions ( x ,f ( 2 ) ( x )) to L 1 .

Remove these from the unlabeled data.

Note f ( 1 ) is a view-1 classifier: although we give it the complete feature x , it only pays attention

to the first view x ( 1 ) and ignores the second view x ( 2 ) . f ( 2 ) is the other way around. They each

provide their most confident unlabeled-data predictions as the training data for the other view. In

this process, the unlabeled data is eventually exhausted.

Co-training is a wrapper method.That is to say, it does not matter what the learning algorithms

are for the two classifiers f ( 1 ) and f ( 2 ) . The only requirement is that the classifiers can assign a

confidence score to their predictions. The confidence score is used to select which unlabeled instances

to turn into additional training data for the other view. Being a wrapper method, Co-Training is

widely applicable to many tasks.

4.3 THE ASSUMPTIONS OF CO-TRAINING

Co-Training makes several assumptions. The most obvious one is the existence of two separate views

x ( 1 ) , x ( 2 )

. For a general task, the features may not naturally split into two views. To apply Co-

Training in this case, one can randomly split the features into two artificial views. Assuming there

are two views, the success of Co-Training depends on the following two assumptions:

Remark 4.2. Co-Training Assumptions

]

1. Each view alone is sufficient to make good classifications, given enough labeled data.

2. The two views are conditionally independent given the class label.

The first assumption is easy to understand. It not only requires that there are two views, but

two good ones. The second assumption is subtle but strong. It states that

P( x ( 1 )

| y, x ( 2 ) )

P( x ( 1 )

| y)

P( x ( 2 )

| y, x ( 1 ) )

P( x ( 2 )

| y).

(4.1)

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home