Database Reference
In-Depth Information
We also see in Fig. 12.14 two parallel hyperplanes at distance γ from the central hyper-
plane w . x + b = 0, and these each touch one or more of the support vectors . The latter are
the points that actually constrain the dividing hyperplane, in the sense that they are all at
distance γ from the hyperplane. In most cases, a d -dimensional set of points has d + 1 sup-
port vectors, as is the case in Fig. 12.14 . However, there can be more support vectors if
too many points happen to lie on the parallel hyperplanes. We shall see an example based
on the points of Fig. 11.1 , where it turns out that all four points are support vectors, even
though two-dimensional data normally has three.
A tentative statement of our goal is:
• Given a training set ( x 1 , y 1 ), ( x 2 , y 2 ), . . . , ( x n , y n ), maximize γ (by varying w and
b ) subject to the constraint that, for all i = 1, 2, . . . , n ,
y i ( w . x i + b ) ≥ γ
Notice that y i , which must be +1 or −1, determines which side of the hyperplane the point
x i must be on, so the ≥ relationship to γ is always correct. However, it may be easier to ex-
press this condition as two cases: if y = +1, then w . x γ , and if y = −1, then w . x ≤ − γ .
Unfortunately, this formulation doesn't really work properly. The problem is that by in-
creasing w and b , we can always allow a larger value of γ . For example, suppose that w and
b satisfy the constraint above. If we replace w by 2 w and b by 2 b , we observe that for all i ,
y i ((2 w ). x i +2 b ) ≥ 2 γ . Thus, 2 w and 2 b is always a better choice that w and b , so there is no
best choice and no maximum γ .
12.3.2
Normalizing the Hyperplane
The solution to the problem that we described intuitively above is to normalize the weight
vector w . That is, the unit of measure perpendicular to the separating hyperplane is the unit
vector w /|| w ||. Recall that || w || is the Frobenius norm, or the square root of the sum of the
squares of the components of w . We shall require that w be such that the parallel hyper-
planes that just touch the support vectors are described by the equations w . x + b = +1 and
w . x + b = −1, as suggested by Fig. 12.15 .
Search WWH ::




Custom Search