Geoscience Reference
In-Depth Information
28 CHAPTER 3. MIXTUREMODELS ANDEM
The M-step updates the model parameters using the current γ values as weights on the unlabeled
instances. If we think of the E-step as creating fractional labeled instances split between the classes,
then the M-step simply computes new MLE parameters using these fractional instances and the
labeled data. The algorithm stops when the log likelihood (3.13) converges (i.e., stops changing
from one iteration to the next). The data log likelihood in the case of a mixture of two Gaussians is
l
l + u
2
log p(
D |
θ)
=
log π y i N
( x i ;
μ y i , y i )
+
log
π j
N
( x i ;
μ j , j ),
(3.20)
i =
1
i = l +
1
j =
1
where we have marginalized over the two classes for the unlabeled data.
It is instructive to note the similarity between EM and self-training. EM can be viewed as a
special form of self-training, where the current classifier θ would label the unlabeled instances with
all possible labels, but each with fractional weights p(
,θ) . Then all these augmented unlabeled
data, instead of the top few most confident ones, are used to update the classifier.
H | D
3.4 THE ASSUMPTIONS OFMIXTUREMODELS
Mixture models provide a framework for semi-supervised learning in which the role of unlabeled
data is clear. In practice, this form of semi-supervised learning can be highly effective if the generative
model is (nearly) correct. It is worth noting the assumption made here:
Remark 3.6. MixtureModel Assumption The data actually comes from the mixture model, where
the number of components, prior p(y) , and conditional p( x
| y) are all correct.
Unfortunately, it can be difficult to assess the model correctness since we do not have much
labeled data. Many times one would choose a generative model based on domain knowledge and/or
mathematical convenience. However, if the model is wrong, semi-supervised learning could actually
hurt performance. In this case, one might be better off to use only the labeled data and perform
supervised learning instead. The following example shows the effect of an incorrect model.
Example 3.7. An Incorrect Generative Model Suppose a dataset contains four clusters of data,
two of each class. This dataset is shown in Figure 3.2. The correct decision boundary is a horizontal
line along the x -axis. Clearly, the data is not generated from two Gaussians. If we insist that each
class is modeled by a single Gaussian, the results may be poor. Figure 3.3 illustrates this point by
comparing two possible GMMs fitting this data. In panel (a), the learned model fits the unlabeled
quite well (having high log likelihood), but predictions using this model will result in approximately
50% error. In contrast, the model shown in panel (b) will lead to much better accuracy. However, (b)
would not be favored by the EM algorithm since it has a lower log likelihood.
As mentioned above, we may be better off using only labeled data and supervised learning
in this case. If we have labeled data in the bottom left cluster and top right cluster, the supervised
Search WWH ::




Custom Search