Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

labels are +1 or −1. Unlike the basic perceptron algorithm, which can produce positive or

negative components in the weight vector w , Winnow produces only positive weights.

The general Winnow Algorithm allows for a variety of parameters to be selected, and we

shall only consider one simple variant. However, all variants have in common the idea that

there is a positive threshold θ . If w is the current weight vector, and x is the feature vector

in the training set that we are currently considering, we compute w . x and compare it with

θ . If the dot product is too low, and the class for x is +1, then we have to raise the weights

of w in those components where x has 1. We multiply these weights by a number greater

than 1. The larger this number, the greater the training rate, so we want to pick a number

that is not too close to 1 (or convergence will be too slow) but also not too large (or the

weight vector may oscillate). Similarly, if w . x ≥ θ , but the class of x is −1, then we want to

lower the weights of w in those components where x is 1. We multiply those weights by a

number greater than 0 but less than 1. Again, we want to pick a number that is not too close

to 1 but also not too small, to avoid slow convergence or oscillation.

We shall give the details of the algorithm using the factors 2 and 1/2, for the cases where

we want to raise weights and lower weights, respectively. Start the Winnow Algorithm with

a weight vector w = [ w 1 , w 2 , . . . , w d ], all of whose components are 1, and let the threshold

θ equal d , the number of dimensions of the vectors in the training examples. Let ( x , y ) be

the next training example to be considered, where x = [ x 1 , x 2 , . . . , x d ].

(1) If w . x > θ and y = +1, or w . x < θ and y = −1, then the example is correctly classified,

so no change to w is made.

(2) If w . x ≤ θ , but y = +1, then the weights for the components where x has 1 are too low

as a group. Double each of the corresponding components of w . That is, if x i = 1 then

set w i := 2 w i .

(3) If w . x ≥ θ , but y = −1, then the weights for the components where x has 1 are too high

as a group. Halve each of the corresponding components of w . That is, if x i = 1 then

set w i := w i /2.

EXAMPLE 12.5 Let us reconsider the training data from Fig. 12.6 . Initialize w = [1, 1, 1, 1,

1] and let θ = 5. First, consider feature vector a = [1, 1, 0, 1, 1]. w . a = 4, which is less than

θ . Since the associated label for a is +1, this example is misclassified. When a +1-labeled

example is misclassified, we must double all the components where the example has 1; in

this case, all but the third component of a is 1. Thus, the new value of w is [2, 2, 1, 2, 2].

Next, we consider training example b = [0, 0, 1, 1, 0]. w . b = 3, which is less than θ .

However, the associated label for b is −1, so no change to w is needed.

For c = [0, 1, 1, 0, 0] we find w . c = 3 < θ , while the associated label is +1. Thus, we

double the components of w where the corresponding components of c are 1. These com-

ponents are the second and third, so the new value of w is [2, 4, 2, 2, 2].

Search WWH ::

Custom Search

Home