Geoscience Reference
In-Depth Information
6
y = 1
4
2
0
−2
−4
y = −1
−6
−5
0
5
x 1
Figure 3.2: Two classes in four clusters (each a 2-dimensional Gaussian distribution).
wrong model, higher log likelihood (−847.9309)
correct model, lower log likelihood (−921.143)
6
6
4
4
2
2
0
0
−2
−2
−4
−4
−6
−6
−6 −4 −2
0
2
4
6
−6 −4 −2
0
2
4
6
(a)
(b)
Figure 3.3: (a) Good fit under the wrong model assumption. The decision boundary is vertical, thus
producing mass misclassification. (b) Worse fit under the wrong model assumption. However, the decision
boundary is correct.
decision boundary would be approximately the line y
=−
x , which would result in only about 25%
error.
There are a number of ways to alleviate the danger of using the wrong model. One obvious
way is to refine the model to fit the task better, which requires domain knowledge. In the above
example, one might model each class itself as a GMM with two components, instead of a single
Gaussian.
Another way is to de-emphasize the unlabeled data, in case the model correctness is uncer-
tain. Specifically, we scale the contribution from unlabeled data in the semi-supervised log likeli-
 
Search WWH ::




Custom Search