Introduction to Statistical Machine Learning - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

1. Find the k training instances x i 1 ,..., x i k closest to x ∗ under distance d().

2. Output y ∗ as the majority class of y i 1 ,...,y i k . Break ties randomly.

Being a D -dimensional feature vector, the test instance x ∗ can be viewed as a point in D -

dimensional feature space. A classifier assigns a label to each point in the feature space. This divides

the feature space into decision regions within which points have the same label. The boundary

separating these regions is called the decision boundary induced by the classifier.

Example 1.14. Consider two classification tasks involving the little green aliens. In the first task

in Figure 1.3(a), the task is gender classification from weight and height. The symbols are training

data. Each training instance has a label: female (red cross) or male (blue circle). The decision regions

from a 1NN classifier are shown as white and gray. In the second task in Figure 1.3(b), the task is age

classification on the same sample of training instances. The training instances now have different

labels: juvenile (red cross) or adult (blue circle). Again, the decision regions of 1NN are shown.

Notice that, for the same training instances but different classification goals, the decision boundary

can be quite different. Naturally, this is a property unique to supervised learning, since unsupervised

learning does not use any particular set of labels at all.

70

female

65

60

juvenile

55

50

adult

45

male

40

80

90

100

110

80

90

100

110

weight (lbs.)

(a) classification by gender

(b) classification by age

Figure 1.3: Classify by gender or age from a training sample of 100 little green aliens, with

1-nearest-neighbor decision regions shown.

In this chapter, we introduced statistical machine learning as a foundation for the rest of

the topic. We presented the unsupervised and supervised learning settings, along with concrete

examples of each. In the next chapter, we provide an overview of semi-supervised learning, which

falls somewhere between these two. Each subsequent chapter will present specific families of semi-

supervised learning algorithms.

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home