Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

look at a single nonblank entry of M and compute the change to each element of U and V

that would make the product UV agree with that element of M . We would not make that

change to the elements of U and V completely, but rather choose some learning rate η less

than 1 and change each element of U and V by the fraction η of the amount that would be

necessary to make UV equal M in the chosen entry.

□

12.3.6

Parallel Implementation of SVM

One approach to parallelism for SVM is analogous to what we suggested for perceptrons in

Section 12.2.8 . You can start with the current w and b , and in parallel do several iterations

based on each training example. Then average the changes for each of the examples to cre-

ate a new w and b . If we distribute w and b to each mapper, then the Map tasks can do as

many iterations as we wish to do in one round, and we need use the Reduce tasks only to

average the results. One iteration of MapReduce is needed for each round.

A second approach is to follow the prescription given here, but implement the compu-

tation of the second term in Equation 12.4 in parallel. The contribution from each training

example can then be summed. This approach requires one round of MapReduce for each

iteration of gradient descent.

12.3.7

Exercises for Section 12.3

EXERCISE 12.3.1 Continue the iterations of Fig. 12.20 for three more iterations.

EXERCISE 12.3.2 The following training set obeys the rule that the positive examples all

have vectors whose components sum to 10 or more, while the sum is less than 10 for the

negative examples.

([3, 4, 5], +1) ([2, 7, 2], +1) ([5, 5, 5], +1)

([1, 2, 3], −1) ([3, 3, 2], −1) ([2, 4, 1], −1)

(a) Which of these six vectors are the support vectors?

! (b) Suggest a vector w and constant b such that the hyperplane defined by w . x + b = 0

is a good separator for the positive and negative examples. Make sure that the scale

of w is such that all points are outside the margin; that is, for each training example

( x , y ), you have y ( w . x + b ) ≥ +1.

! (c) Starting with your answer to part (b), use gradient descent to find the optimum w

and b . Note that if you start with a separating hyperplane, and you scale w properly,

Search WWH ::

Custom Search

Home