Database Reference
In-Depth Information
look at a single nonblank entry of M and compute the change to each element of U and V
that would make the product UV agree with that element of M . We would not make that
change to the elements of U and V completely, but rather choose some learning rate η less
than 1 and change each element of U and V by the fraction η of the amount that would be
necessary to make UV equal M in the chosen entry.
12.3.6
Parallel Implementation of SVM
One approach to parallelism for SVM is analogous to what we suggested for perceptrons in
Section 12.2.8 . You can start with the current w and b , and in parallel do several iterations
based on each training example. Then average the changes for each of the examples to cre-
ate a new w and b . If we distribute w and b to each mapper, then the Map tasks can do as
many iterations as we wish to do in one round, and we need use the Reduce tasks only to
average the results. One iteration of MapReduce is needed for each round.
A second approach is to follow the prescription given here, but implement the compu-
tation of the second term in Equation 12.4 in parallel. The contribution from each training
example can then be summed. This approach requires one round of MapReduce for each
iteration of gradient descent.
12.3.7
Exercises for Section 12.3
EXERCISE 12.3.1 Continue the iterations of Fig. 12.20 for three more iterations.
EXERCISE 12.3.2 The following training set obeys the rule that the positive examples all
have vectors whose components sum to 10 or more, while the sum is less than 10 for the
negative examples.
([3, 4, 5], +1) ([2, 7, 2], +1) ([5, 5, 5], +1)
([1, 2, 3], −1) ([3, 3, 2], −1) ([2, 4, 1], −1)
(a) Which of these six vectors are the support vectors?
! (b) Suggest a vector w and constant b such that the hyperplane defined by w . x + b = 0
is a good separator for the positive and negative examples. Make sure that the scale
of w is such that all points are outside the margin; that is, for each training example
( x , y ), you have y ( w . x + b ) ≥ +1.
! (c) Starting with your answer to part (b), use gradient descent to find the optimum w
and b . Note that if you start with a separating hyperplane, and you scale w properly,
Search WWH ::




Custom Search