Geoscience Reference
In-Depth Information
18 CHAPTER 2. OVERVIEWOF SEMI-SUPERVISEDLEARNING
the well-separated cluster assumption and leads the algorithm astray. Clearly, self-training methods
such as propagating 1-nearest-neighbor are highly sensitive to outliers that may lead to propagating
incorrect information. In the case of the current example, one way to avoid this issue is to consider
more than the single nearest neighbor in both selecting the next point to label as well as assigning
it a label.
70
70
65
65
outlier
60
60
55
55
50
50
45
45
40
40
80
90
100
110
80
90
100
110
weight (lbs.)
weight (lbs.)
(a)
(b)
70
70
65
65
60
60
55
55
50
50
45
45
40
40
80
90
100
110
80
90
100
110
weight (lbs.)
weight (lbs.)
(c)
(d)
Figure 2.4: Propagating 1-nearest-neighbor illustration featuring an outlier: (a) after first few iterations,
(b,c) steps highlighting the effect of the outlier, (d) final labeling of all instances, with the entire rightmost
cluster mislabeled.
This concludes our basic introduction to the motivation behind semi-supervised learning,
and the various issues that a practitioner must keep in mind. We also showed a simple example of
semi-supervised learning to highlight the potential successes and failures. In the next chapter, we
discuss in depth a more sophisticated type of semi-supervised learning algorithm that uses generative
probabilistic models.
 
Search WWH ::




Custom Search