Geoscience Reference
In-Depth Information
5.
Remove a subset S from U;add
{
( x ,f( x ))
|
x
S
}
to L.
The main idea is to first train f on labeled data. The function f is then used to predict the
labels for the unlabeled data. A subset S of the unlabeled data, together with their predicted labels,
are then selected to augment the labeled data. Typically, S consists of the few unlabeled instances
with the most confident f predictions. The function f is re-trained on the now larger set of labeled
data, and the procedure repeats. It is also possible for S to be the whole unlabeled data set. In this
case, L and U remain the whole training sample, but the assigned labels on unlabeled instances
might vary from iteration to iteration.
Remark 2.5. Self-Training Assumption The assumption of self-training is that its own predic-
tions, at least the high confidence ones, tend to be correct. This is likely to be the case when the
classes form well-separated clusters.
The major advantages of self-training are its simplicity and the fact that it is a wrapper method.
This means that the choice of learner for f in step 3 is left completely open. For example, the learner
can be a simple k NN algorithm, or a very complicated classifier. The self-training procedure “wraps”
around the learner without changing its inner workings. This is important for many real world tasks
like natural language processing, where the learners can be complicated black boxes not amenable
to changes.
On the other hand, it is conceivable that an early mistake made by f (which is not perfect
to start with, due to a small initial L ) can reinforce itself by generating incorrectly labeled data.
Re-training with this data will lead to an even worse f in the next iteration. Various heuristics have
been proposed to alleviate this problem.
Example 2.6. As a concrete example of self-training, we now introduce an algorithm we call
propagating 1-nearest-neighbor and illustrate it using the little green aliens data.
Algorithm 2.7. Propagating 1-Nearest-Neighbor.
l + u
i = 1 , unlabeled data
Input: labeled data
{ ( x i ,y i ) }
{
x j }
j = l + 1 , distance function d().
l + u
i = 1 and U
1. Initially, let L
={
( x i ,y i )
}
={
x j }
j = l + 1 .
2. Repeat until U is empty:
3.
argmin x U min x L d( x , x ).
Select x
=
4.
Set f( x ) to the label of x 's nearest instance in L. Break ties randomly.
5.
Remove x from U;add( x ,f( x )) to L.
This algorithm wraps around a 1-nearest-neighbor classifier. In each iteration, it selects the
unlabeled instance that is closest to any “labeled” instance (i.e., any instance currently in L ,someof
which were labeled by previous iterations). The algorithm approximates confidence by the distance
Search WWH ::




Custom Search