Geoscience Reference
In-Depth Information
CHAPTER
6
Semi-Supervised Support Vector
Machines
The intuition behind Semi-Supervised Support Vector Machines (S3VMs) is very simple. Fig-
ure 6.1(a) shows a completely labeled dataset. If we were to draw a straight line to separate the two
classes, where should the line be? One reasonable place is right in the middle, such that its distance
to the nearest positive or negative instance is maximized. This is the linear decision boundary found
by Support Vector Machines (SVMs), and is shown in Figure 6.1(a). The figure also shows two
dotted lines that go through the nearest positive and negative instances. The distance from the de-
cision boundary to a dotted line is called the geometric margin. As mentioned above, this margin is
maximized by SVMs.
+
+
+
+
+
+
(a) SVM decision boundary
(b) S3VM decision boundary
Figure 6.1: (a) With only labeled data, the linear decision boundary that maximizes the distance to any
labeled instance is shown in solid line. Its associated margin is shown in dashed lines. (b) With additional
unlabeled data, under the assumption that the classes are well-separated, the decision boundary seeks a
gap in unlabeled data.
What if we have many additional unlabeled instances, distributed as in Figure 6.1(b)? The
SVM decision boundary will cut through dense unlabeled data regions. This seems undesirable, if
we assume that the two classes are well-separated. Instead, the best decision boundary now seems to
be the one in Figure 6.1(b), which falls in to the gap between the unlabeled data. This new decision
boundary still separates the two classes in the labeled data, though its margin is smaller than the
SVM decision boundary (this can be easily verified by measuring the distance to the nearest labeled
point). The new decision boundary is the one found by S3VMs, and is defined by both labeled and
unlabeled data. We will now make this intuition precise.
 
Search WWH ::




Custom Search