Geoscience Reference
In-Depth Information
1. Find the k training instances x i 1 ,..., x i k closest to x under distance d().
2. Output y as the majority class of y i 1 ,...,y i k . Break ties randomly.
Being a D -dimensional feature vector, the test instance x can be viewed as a point in D -
dimensional feature space. A classifier assigns a label to each point in the feature space. This divides
the feature space into decision regions within which points have the same label. The boundary
separating these regions is called the decision boundary induced by the classifier.
Example 1.14. Consider two classification tasks involving the little green aliens. In the first task
in Figure 1.3(a), the task is gender classification from weight and height. The symbols are training
data. Each training instance has a label: female (red cross) or male (blue circle). The decision regions
from a 1NN classifier are shown as white and gray. In the second task in Figure 1.3(b), the task is age
classification on the same sample of training instances. The training instances now have different
labels: juvenile (red cross) or adult (blue circle). Again, the decision regions of 1NN are shown.
Notice that, for the same training instances but different classification goals, the decision boundary
can be quite different. Naturally, this is a property unique to supervised learning, since unsupervised
learning does not use any particular set of labels at all.
70
70
female
65
65
60
60
juvenile
55
55
50
50
adult
45
45
male
40
40
80
90
100
110
80
90
100
110
weight (lbs.)
weight (lbs.)
(a) classification by gender
(b) classification by age
Figure 1.3: Classify by gender or age from a training sample of 100 little green aliens, with
1-nearest-neighbor decision regions shown.
In this chapter, we introduced statistical machine learning as a foundation for the rest of
the topic. We presented the unsupervised and supervised learning settings, along with concrete
examples of each. In the next chapter, we provide an overview of semi-supervised learning, which
falls somewhere between these two. Each subsequent chapter will present specific families of semi-
supervised learning algorithms.
 
Search WWH ::




Custom Search