Overview of Semi-Supervised Learning - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

assumption. The average test sample error rate here is 29.6%, with an average decision boundary of

0.01 (standard deviation 0.48). Despite the wrong model assumption, this approach uses knowledge

that the two classes contain roughly the same number of instances, so the decision boundaries are

drawn toward the center. This might explain the surprisingly good performance compared to the

correct model.

The third approach is a graph-based model (Chapter 5), with a typical way to generate the

graph: any two instances in the labeled and unlabeled data are connected by an edge. The edge

weight is large if the two instances are close to each other, and small if they are far away. The

model assumption is that instances connected with large-weight edges tend to have the same label.

However, in this particular example where the two classes overlap, instances from different classes

can be quite close and connected by large-weight edges. Therefore, the model assumption does not

match the task either. The results using this model are shown in Figure 2.2 as thin solid lines. 2 The

graph-based models' average test sample error rate is 36.4%, with an average decision boundary at

0.03 (standard deviation 1.23). The graph-based model is inappropriate for this task and performs

even worse than supervised learning.

As the above example shows, the model assumption plays an important role in semi-supervised

learning. It makes up for the lack of labeled data, and can determine the quality of the predictor.

However, making the right assumptions (or detecting wrong assumptions) remains an open question

in semi-supervised learning. This means the question “which semi-supervised model should I use?”

does not have an easy answer. Consequently, this topic will mainly present methodology. Most

chapters will introduce a distinct family of semi-supervised learning models. We start with a simple

semi-supervised classification model: self-training.

2.5

SELF-TRAININGMODELS

Self-training is characterized by the fact that the learning process uses its own predictions to teach

itself. For this reason, it is also called self-teaching or bootstrapping (not to be confused with the

statistical procedure with the same name). Self-training can be either inductive or transductive,

depending on the nature of the predictor f .

Algorithm 2.4. Self-training.

l

l + u

Input: labeled data

{ ( x i ,y i ) }

i = 1 , unlabeled data

{

x j }

j = l + 1 .

l + u

j = l +

l

i =

1. Initially, let L ={ ( x i ,y i ) }

1 and U ={

x j }

1 .

2. Repeat:

3.

Train f from L using supervised learning.

4.

Apply f to the unlabeled instances in U .

2

2 σ 2 , with σ = 0 . 1), and predictions

were made using the closed-form harmonic function solution. While this is a transductive method, we calculate the boundary as

the value on the x -axis where the predicted label changes.

= exp || x i − x j ||

2 The graph-based model used here featured a Gaussian-weighted graph ( w ij

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home