Databases Reference
In-Depth Information
class A
class B
Y
Margin
X
Figure 4.11 Dividing Line Margin
classifier. Is one better than the other? Intuitively, one might expect that the
center line would generalize better. How might this center line be computa-
tionally located?
Suppose that now, instead of using a line to divide the points, we use a bar
(Figure 4.11). If that bar is rotated as needed, forcing it to be as wide as possible
without overlapping any of the points, the length-wise bisecting line will be that
dividing line expected to generalize best. (Note: For the interested reader, proofs
that this is indeed the dividing line that will generalize best are found in
numerous papers and books on support vector machines.)
One-half of the width of the bar in Figure 4.11 is known as the margin , which
is interpreted as the distance the dividing line can be moved without introducing
classification error. Hence, the classifier construction problem can be recast as
that of locating the dividing line having the greatest margin.
Consider now the problem where it is not possible to totally separate the
points with a dividing line (Figure 4.12). The solution is to add a slack value (
)
to each of the offending points that would be equal to the amount needed to push
it back to the non-offending side of the margin, while recognizing that for most
points in the dataset, the required slack is zero. To accomplish this, the dividing
line needs to be chosen such that it minimizes the total slack requirements. The
problem now becomes one of maximizing the margin while at the same time
minimizing the total slack requirements. Although in this text, we do not go into
the mathematics of the search mechanism for locating the optimal solution,
suffice it to say that it uses a combination of the two objectives - maximizing the
margin and minimizing the total slack.
z
Search WWH ::




Custom Search