Geoscience Reference
In-Depth Information
string ( x ( 2 ) ). The two classifiers teach each other . One can formalize this process into a Co-Training
algorithm.
Algorithm 4.1. Co-Training.
Input: labeled data
i = 1 , unlabeled data
l + u
{
( x i ,y i )
}
{
x j }
j = l + 1 , a learning speed k.
x ( 1 )
i
, x ( 2 )
i
.
1. Initially let the training sample be L 1 = L 2 ={ ( x 1 ,y 1 ),...,( x l ,y l ) }
Each instance has two views x i =[
]
.
2. Repeat until unlabeled data is used up:
3.
Train a view-1 classifier f ( 1 ) from L 1 , and a view-2 classifier f ( 2 ) from L 2 .
Classify the remaining unlabeled data with f ( 1 ) and f ( 2 ) separately.
4.
Add f ( 1 ) 's top k most-confident predictions ( x ,f ( 1 ) ( x )) to L 2 .
Add f ( 2 ) 's top k most-confident predictions ( x ,f ( 2 ) ( x )) to L 1 .
Remove these from the unlabeled data.
5.
Note f ( 1 ) is a view-1 classifier: although we give it the complete feature x , it only pays attention
to the first view x ( 1 ) and ignores the second view x ( 2 ) . f ( 2 ) is the other way around. They each
provide their most confident unlabeled-data predictions as the training data for the other view. In
this process, the unlabeled data is eventually exhausted.
Co-training is a wrapper method.That is to say, it does not matter what the learning algorithms
are for the two classifiers f ( 1 ) and f ( 2 ) . The only requirement is that the classifiers can assign a
confidence score to their predictions. The confidence score is used to select which unlabeled instances
to turn into additional training data for the other view. Being a wrapper method, Co-Training is
widely applicable to many tasks.
4.3 THE ASSUMPTIONS OF CO-TRAINING
Co-Training makes several assumptions. The most obvious one is the existence of two separate views
x
x ( 1 ) , x ( 2 )
. For a general task, the features may not naturally split into two views. To apply Co-
Training in this case, one can randomly split the features into two artificial views. Assuming there
are two views, the success of Co-Training depends on the following two assumptions:
Remark 4.2. Co-Training Assumptions
=[
]
1. Each view alone is sufficient to make good classifications, given enough labeled data.
2. The two views are conditionally independent given the class label.
The first assumption is easy to understand. It not only requires that there are two views, but
two good ones. The second assumption is subtle but strong. It states that
P( x ( 1 )
| y, x ( 2 ) )
P( x ( 1 )
=
| y)
P( x ( 2 )
| y, x ( 1 ) )
P( x ( 2 )
=
| y).
(4.1)
Search WWH ::




Custom Search