Geoscience Reference
In-Depth Information
CHAPTER
3
Mixture Models and EM
Unlabeled data tells us how the instances from all the classes, mixed together, are distributed. If
we know how the instances from each class are distributed, we may decompose the mixture into
individual classes. This is the idea behind mixture models . In this chapter, we formalize the idea
of mixture models for semi-supervised learning. First we review some concepts in probabilistic
modeling. Readers familiar with machine learning can skip to Section 3.2.
3.1 MIXTUREMODELS FORSUPERVISEDCLASSIFICATION
Example 3.1. Gaussian Mixture Model with Two Components Suppose training data comes
from two one-dimensional Gaussian distributions. Figure 3.1 illustrates the underlying p( x
y) dis-
tributions and a small training sample with only two labeled instances and several unlabeled instances.
|
negative distribution
positive distribution
optimal decision boundary
unlabeled instance
negative instance
positive instance
Figure 3.1: Two classes forming a mixture model with 1-dimensional Gaussian distribution components.
The dashed curves are p( x
| y =−
1 ) and p( x
| y =
1 ) , respectively. The labeled and unlabeled instances
are plotted on the x -axis.
Suppose we know that the data comes from two Gaussian distributions, but we do not know
their parameters (the mean, variance, and prior probabilities, which we will define soon). We can
use the data (labeled and unlabeled) to estimate these parameters for both distributions. Note that,
in this example, the labeled data is actually misleading: the labeled instances are both to the right of
the means of the true distributions. The unlabeled data, however, helps us to identify the means of
the two Gaussian distribution. Computationally, we select parameters to maximize the probability
of generating such training data from the proposed model. In particular, the training samples are
more likely if the means of the Gaussians are centered over the unlabeled data, rather than shifted
to the right over the labeled data.
Search WWH ::




Custom Search