Database Reference
In-Depth Information
12.3.4
SVM Solutions by Gradient Descent
A common approach to solving Equation 12.4 is to use quadratic programming. For large-
scale data, another approach, gradient descent has an advantage. We can allow the data to
reside on disk, rather than keeping it all in memory, which is normally required for quadrat-
ic solvers. To implement gradient descent, we compute the derivative of the equation with
respect to b and each component w j of the vector w . Since we want to minimize f ( w , b ),
we move b and the components w j in the direction opposite to the direction of the gradient.
The amount we move each component is proportional to the derivative with respect to that
component.
Our first step is to use the trick of Section 12.2.4 to make b part of the weight vector w .
Notice that b is really the negative of a threshold on the dot product w . x , so we can append
a ( d + 1)st component b to w and append an extra component with value +1 to every feature
vector in the training set (not −1 as we did in Section 12.2.4 ).
We must choose a constant η to be the fraction of the gradient that we move w in each
round. That is, we assign
for all j = 1, 2, . . . , d + 1.
The derivative of the first term in Equation 12.4 , is easy; it is w j . 2 However, the
second term involves the hinge function, so it is harder to express. We shall use an if-then
expression to describe these derivatives, as in Equation 12.5 . That is:
(12.6)
Note that this formula applies to w d +1 , which is b , as well as to the weights w 1 , w 2 , . . . , w d .
We continue to use b instead of the equivalent w d +1 in the if-then condition to remind us of
the form in which the desired hyperplane is described.
To execute the gradient-descent algorithm on a training set, we pick:
(1) Values for the parameters C and η .
(2) Initial values for w , including the ( d + 1)st component b .
Then, we repeatedly:
(a) Compute the partial derivatives of f ( w , b ) with respect to the w j .
(b) Adjust the values of w by subtracting from each w j .
Search WWH ::




Custom Search