Geoscience Reference
In-Depth Information
assumption. The average test sample error rate here is 29.6%, with an average decision boundary of
0.01 (standard deviation 0.48). Despite the wrong model assumption, this approach uses knowledge
that the two classes contain roughly the same number of instances, so the decision boundaries are
drawn toward the center. This might explain the surprisingly good performance compared to the
correct model.
The third approach is a graph-based model (Chapter 5), with a typical way to generate the
graph: any two instances in the labeled and unlabeled data are connected by an edge. The edge
weight is large if the two instances are close to each other, and small if they are far away. The
model assumption is that instances connected with large-weight edges tend to have the same label.
However, in this particular example where the two classes overlap, instances from different classes
can be quite close and connected by large-weight edges. Therefore, the model assumption does not
match the task either. The results using this model are shown in Figure 2.2 as thin solid lines. 2 The
graph-based models' average test sample error rate is 36.4%, with an average decision boundary at
0.03 (standard deviation 1.23). The graph-based model is inappropriate for this task and performs
even worse than supervised learning.
As the above example shows, the model assumption plays an important role in semi-supervised
learning. It makes up for the lack of labeled data, and can determine the quality of the predictor.
However, making the right assumptions (or detecting wrong assumptions) remains an open question
in semi-supervised learning. This means the question “which semi-supervised model should I use?”
does not have an easy answer. Consequently, this topic will mainly present methodology. Most
chapters will introduce a distinct family of semi-supervised learning models. We start with a simple
semi-supervised classification model: self-training.
2.5
SELF-TRAININGMODELS
Self-training is characterized by the fact that the learning process uses its own predictions to teach
itself. For this reason, it is also called self-teaching or bootstrapping (not to be confused with the
statistical procedure with the same name). Self-training can be either inductive or transductive,
depending on the nature of the predictor f .
Algorithm 2.4. Self-training.
l
l + u
Input: labeled data
{ ( x i ,y i ) }
i = 1 , unlabeled data
{
x j }
j = l + 1 .
l + u
j = l +
l
i =
1. Initially, let L ={ ( x i ,y i ) }
1 and U ={
x j }
1 .
2. Repeat:
3.
Train f from L using supervised learning.
4.
Apply f to the unlabeled instances in U .
2
2 σ 2 , with σ = 0 . 1), and predictions
were made using the closed-form harmonic function solution. While this is a transductive method, we calculate the boundary as
the value on the x -axis where the predicted label changes.
= exp || x i x j ||
2 The graph-based model used here featured a Gaussian-weighted graph ( w ij
Search WWH ::




Custom Search