Overview of Semi-Supervised Learning - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

2.4 CAVEATS

It seems reasonable that semi-supervised learning can use additional unlabeled data, which by it-

self does not carry information on the mapping

, to learn a better predictor f . As men-

tioned earlier, the key lies in the semi-supervised model assumptions about the link between the

marginal distribution P( x ) and the conditional distribution P(y

X → Y

x ) . There are several different

semi-supervised learning methods, and each makes slightly different assumptions about this link.

These methods include self-training, probabilistic generative models, co-training, graph-based mod-

els, semi-supervised support vector machines, and so on. In the next several chapters, we will go

through these models and discuss their assumptions. In Section 8.2, we will also give some theoretic

justification. Empirically, these semi-supervised learning models do produce better classifiers than

supervised learning on some data sets.

However, it is worth pointing out that blindly selecting a semi-supervised learning method

for a specific task will not necessarily improve performance over supervised learning. In fact, unla-

beled data can lead to worse performance with the wrong link assumptions. The following example

demonstrates this sensitivity to model assumptions by comparing supervised learning performance

with several semi-supervised learning approaches on a simple classification problem. Don't worry if

these approaches appear mysterious; we will explain how they work in detail in the rest of the topic.

For now, the main point is that semi-supervised learning performance depends on the correctness

of the assumptions made by the model in question.

|

Example 2.3. Consider a classification task where there are two classes, each with a Gaussian

distribution. The two Gaussian distributions heavily overlap (top panel of Figure 2.2). The true

decision boundary lies in the middle of the two distributions, shown as a dotted line. Since we know

the true distributions, we can compute test sample error rates based on the probability mass of each

Gaussian that falls on the incorrect side of the decision boundary. Due to the overlapping class

distributions, the optimal error rate (i.e., the Bayes error) is 21.2%.

For supervised learning, the learned decision boundary is in the middle of the two labeled

instances, and the unlabeled instances are ignored. See, for example, the thick solid line in the second

panel of Figure 2.2. We note that it is away from the true decision boundary, because the two labeled

instances are randomly sampled. If we were to draw two other labeled instances, the learned decision

boundary would change, but most likely would still be off (see other panels of Figure 2.2). On average,

the expected learned decision boundary will coincide with the true boundary, but for any given draw

of labeled data it will be off quite a bit. We say that the learned boundary has high variance. To

evaluate supervised learning, and the semi-supervised learning methods introduced below, we drew

1000 training samples, each with one labeled and 99 unlabeled instances per class. In contrast to the

optimal decision boundary, the decision boundaries found using supervised learning have an average

test sample error rate of 31.6%. The average decision boundary lies at 0.02 (compared to the optimal

boundary of 0), but has standard deviation of 0.72.

Now without presenting the details, we show the learned decision boundaries of three semi-

supervised learning models on the training data. These models will be presented in detail in later

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home