Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

We also see in Fig. 12.14 two parallel hyperplanes at distance γ from the central hyper-

plane w . x + b = 0, and these each touch one or more of the support vectors . The latter are

the points that actually constrain the dividing hyperplane, in the sense that they are all at

distance γ from the hyperplane. In most cases, a d -dimensional set of points has d + 1 sup-

port vectors, as is the case in Fig. 12.14 . However, there can be more support vectors if

too many points happen to lie on the parallel hyperplanes. We shall see an example based

on the points of Fig. 11.1 , where it turns out that all four points are support vectors, even

though two-dimensional data normally has three.

A tentative statement of our goal is:

• Given a training set ( x 1 , y 1 ), ( x 2 , y 2 ), . . . , ( x n , y n ), maximize γ (by varying w and

b ) subject to the constraint that, for all i = 1, 2, . . . , n ,

y i ( w . x i + b ) ≥ γ

Notice that y i , which must be +1 or −1, determines which side of the hyperplane the point

x i must be on, so the ≥ relationship to γ is always correct. However, it may be easier to ex-

press this condition as two cases: if y = +1, then w . x ≥ γ , and if y = −1, then w . x ≤ − γ .

Unfortunately, this formulation doesn't really work properly. The problem is that by in-

creasing w and b , we can always allow a larger value of γ . For example, suppose that w and

b satisfy the constraint above. If we replace w by 2 w and b by 2 b , we observe that for all i ,

y i ((2 w ). x i +2 b ) ≥ 2 γ . Thus, 2 w and 2 b is always a better choice that w and b , so there is no

best choice and no maximum γ .

12.3.2

Normalizing the Hyperplane

The solution to the problem that we described intuitively above is to normalize the weight

vector w . That is, the unit of measure perpendicular to the separating hyperplane is the unit

vector w /|| w ||. Recall that || w || is the Frobenius norm, or the square root of the sum of the

squares of the components of w . We shall require that w be such that the parallel hyper-

planes that just touch the support vectors are described by the equations w . x + b = +1 and

w . x + b = −1, as suggested by Fig. 12.15 .

Search WWH ::

Custom Search

Home