Geoscience Reference
In-Depth Information
to measure the confidence is the entropy. For a Bernoulli random variable with probability p , the
entropy is defined as
H(p) =− p log p ( 1
p) log ( 1
p).
(6.28)
The entropy H reaches its minimum 0 when p =
1, i.e., when the outcome is most certain;
H reaches its maximum 1 when p = 0 . 5, i.e., most uncertain. Given a unlabeled training sample
{
0 or p =
l + u
x j }
j = l + 1 , the entropy regularizer for logistic regression is defined as
l + u
l + u
H( 1 / 1
f( x j )) ).
(f)
=
H(p(y
=
1
|
x j , w ,b))
=
+
exp (
(6.29)
j = l +
1
j = l +
1
The entropy regularizer is small if the classification on the unlabeled instances is certain. Figure 6.4(b)
shows the entropy regularizer on a single unlabeled instance x as a function of f( x ) . Note its similarity
to the hat loss in Figure 6.3(b). In direct analogy to S3VMs, we can define semi-supervised logistic
regression by incorporating this entropy regularizer:
l
l + u
H( 1 / 1
f( x j )) ).
2
min
w ,b
log ( 1
+
exp (
y i f( x i )))
+
λ 1
w
+
λ 2
+
exp (
(6.30)
i =
1
j = l +
1
6.4 THE ASSUMPTION OF S3VMS AND ENTROPY REGU-
LARIZATION
Remark 6.1. The assumption of both S3VMs and entropy regularization is that the classes are
well-separated, such that the decision boundary falls into a low density region in the feature space,
and does not cut through dense unlabeled data.
If this assumption does not hold, these algorithms may be led astray. We now describe an
example scenario where S3VMs may lead to particularly poor performance.
Example 6.2. S3VMs when theModel AssumptionDoes Not Hold Consider the data shown in
Figure 6.5. The underlying data distribution p(x) is uniform in a circle of radius 0.5, except for a gap
of width 0.2 along the diagonal y =− x where the density is 0. The true class boundary is along the
anti-diagonal y
x , though. Clearly, the classes are not well-separated, and the low density region
does not correspond to the true decision boundary. This poses two problems. First, consider the case
in which the labeled instances appear on the same side of the low density region (Figure 6.5(a)). An
S3VM's search for a gap between the two classes may stuck in one of many possible local minima.
The resulting decision boundary may be worse than the decision boundary of an SVM that does
not try to exploit unlabeled data at all. Second and more severely, if the labeled instances appear on
opposite sides of the gap (Figure 6.5(b)), the S3VM will be attracted to this region and produce a
very poor classifier that gets half of its predictions incorrect.
=
Search WWH ::




Custom Search