Database Reference
In-Depth Information
We also see in
Fig. 12.14
two parallel hyperplanes at distance
γ
from the central hyper-
plane
w
.
x
+
b
= 0, and these each touch one or more of the
support vectors
. The latter are
the points that actually constrain the dividing hyperplane, in the sense that they are all at
distance
γ
from the hyperplane. In most cases, a
d
-dimensional set of points has
d
+ 1 sup-
port vectors, as is the case in
Fig. 12.14
.
However, there can be more support vectors if
too many points happen to lie on the parallel hyperplanes. We shall see an example based
on the points of
Fig. 11.1
, where it turns out that all four points are support vectors, even
though two-dimensional data normally has three.
A tentative statement of our goal is:
• Given a training set (
x
1
,
y
1
), (
x
2
,
y
2
), . . . , (
x
n
,
y
n
), maximize
γ
(by varying
w
and
b
) subject to the constraint that, for all
i
= 1, 2, . . . ,
n
,
y
i
(
w
.
x
i
+
b
) ≥
γ
Notice that
y
i
, which must be +1 or −1, determines which side of the hyperplane the point
x
i
must be on, so the ≥ relationship to
γ
is always correct. However, it may be easier to ex-
press this condition as two cases: if
y
= +1, then
w
.
x
≥
γ
, and if
y
= −1, then
w
.
x
≤ −
γ
.
Unfortunately, this formulation doesn't really work properly. The problem is that by in-
creasing
w
and
b
, we can always allow a larger value of
γ
. For example, suppose that
w
and
b
satisfy the constraint above. If we replace
w
by 2
w
and
b
by 2
b
, we observe that for all
i
,
y
i
((2
w
).
x
i
+2
b
) ≥ 2
γ
. Thus, 2
w
and 2
b
is always a better choice that
w
and
b
, so there is no
best choice and no maximum
γ
.
12.3.2
Normalizing the Hyperplane
The solution to the problem that we described intuitively above is to normalize the weight
vector
w
. That is, the unit of measure perpendicular to the separating hyperplane is the unit
vector
w
/||
w
||. Recall that ||
w
|| is the Frobenius norm, or the square root of the sum of the
squares of the components of
w
. We shall require that
w
be such that the parallel hyper-
planes that just touch the support vectors are described by the equations
w
.
x
+
b
= +1 and
w
.
x
+
b
= −1, as suggested by
Fig. 12.15
.