Geoscience Reference
In-Depth Information
True
distribution
Training set 1
Training set 2
Training set 3
Training set 4
Training set 5
Negative distribution
Positive distribution
Unlabeled instance
Negative instance
Positive instance
Optimal
Supervised
Generative model
S3VM
Graph−based
Figure 2.2: Two classes drawn from overlapping Gaussian distributions (top panel). Decision boundaries
learned by several algorithms are shown for five random samples of labeled and unlabeled training samples.
chapters. The first one is a probabilistic generative model with two Gaussian distributions learned
with EM (Chapter 3)—this model makes the correct model assumption. The decision boundaries
are shown in Figure 2.2 as dashed lines. In this case, the boundaries tend to be closer to the true
boundary and similar to one another, i.e., this algorithm has low variance. The 1000-trial average
test sample error rate for this algorithm is 30.2%. The average decision boundary is at -0.003 with
a standard deviation of 0.55, indicating the algorithm is both more accurate and more stable than
the supervised model.
The second model is a semi-supervised support vector machine (Chapter 6), which assumes
that the decision boundary should not pass through dense unlabeled data regions. However, since the
two classes strongly overlap, the true decision boundary actually passes through the densest region.
Therefore, the model assumption does not entirely match the task. The learned decision boundaries
are shown in Figure 2.2 as dash-dotted lines. 1 The result is better than supervised classification
and performs about the same as the probabilistic generative model that makes the correct model
1 The semi-supervised support vector machine results were obtained using transductive SVM code similar to SVM-light.
 
Search WWH ::




Custom Search