Mixture Models and EM - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

Partially labeled data

Single linkage clustering

Predicted labeling

70

60

50

40

80

90

100

110

80

90

100

110

80

90

100

110

weight (lbs.)

Figure 3.5: Cluster-then-label results using single linkage hierarchical agglomerative clustering (

A

) and

majority vote (

L

).

A

is unsupervised. In step 2, we learn one supervised

predictor using the labeled instances that fall into each cluster, and use the predictor to label the

unlabeled instances in that cluster. One can use any clustering algorithm

In step 1, the clustering algorithm

A

L

.

It is worth noting that cluster-then-label does not necessarily involve a probabilistic mixture model.

The following example shows cluster-then-label with Hierarchical Agglomerative Clustering

and supervised learner

for

A

, and simple majority vote within each cluster for

L

.

Example 3.10. Cluster-then-Label withHierarchical Agglomerative Clustering In this exam-

ple, we apply the cluster-then-label approach to the aliens data from Chapter 1. For step 1, we use

hierarchical agglomerative clustering (Algorithm 1.1) with Euclidean distance as the underlying

distance function, and the single linkage method to determine distances between clusters. Because

the labeled data contains only two classes, we heuristically stop the algorithm once only two clusters

remain. Steps 2-4 of cluster-then-label find the majority label within each cluster, and assign this

label to all unlabeled instances in the cluster.

Figure 3.5 shows the original partially labeled data, the two clusters, and the final labels

predicted for all the data. In this case, because the clusters coincide with the true labeling of the data,

we correctly classify all unlabeled instances. The lines between instances in the center panel indicate

the linkages responsible for merging clusters until only two remain.

It turns out that using single linkage is critically important here, where the natural clusters

are long and skinny. A different choice—e.g., complete linkage clustering—tends to form rounder

clusters. When applied to this data, complete linkage produces the results shown in Figure 3.6. The

clusters found by complete linkage do not match up with the true labeling of the data. In fact, both

labeled instances end up in the same cluster. Because there is no majority label in either cluster,

the majority vote algorithm ends up breaking ties randomly, thus assigning random labels to all

unlabeled instances.

The point of this example is not to show that complete linkage is bad and single linkage is

good. In fact, it could have been the other way around for other datasets! Instead, the example is

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home