Database Reference
In-Depth Information
. . . , x id ]. Also, as before, w = [ w 1 , w 2 , . . . , w d ]. Note that the two summations express
the dot product of vectors.
The constant C , called the regularization parameter , reflects how important misclassi-
fication is. Pick a large C if you really do not want to misclassify points, but you would
accept a narrow margin. Pick a small C if you are OK with some misclassified points, but
want most of the points to be far away from the boundary (i.e., the margin is large).
We must explain the penalty function (second term) in Equation 12.4 . The summation
over i has one term
for each training example x i . The quantity L is a hinge function , suggested in Fig. 12.17 ,
and we call its value the hinge loss . Let When z i is 1 or more, the value of L is 0.
But for smaller values of z i , L rises linearly as z i decreases.
Figure 12.17 The hinge function decreases linearly for z ≤ 1 and then remains 0
Since we shall have need to take the derivative with respect to each w j of L ( x i , y i ), note
that the derivative of the hinge function is discontinuous. It is − y i x ij for z i < 1 and 0 for z i >
1. That is, if y i = +1 (i.e., the i th training example is positive), then
Moreover, if y i = −1 (i.e., the i th training example is negative), then
The two cases can be summarized as one, if we include the value of y i , as:
(12.5)
Search WWH ::




Custom Search