Database Reference
In-Depth Information
labels are +1 or −1. Unlike the basic perceptron algorithm, which can produce positive or
negative components in the weight vector w , Winnow produces only positive weights.
The general Winnow Algorithm allows for a variety of parameters to be selected, and we
shall only consider one simple variant. However, all variants have in common the idea that
there is a positive threshold θ . If w is the current weight vector, and x is the feature vector
in the training set that we are currently considering, we compute w . x and compare it with
θ . If the dot product is too low, and the class for x is +1, then we have to raise the weights
of w in those components where x has 1. We multiply these weights by a number greater
than 1. The larger this number, the greater the training rate, so we want to pick a number
that is not too close to 1 (or convergence will be too slow) but also not too large (or the
weight vector may oscillate). Similarly, if w . x θ , but the class of x is −1, then we want to
lower the weights of w in those components where x is 1. We multiply those weights by a
number greater than 0 but less than 1. Again, we want to pick a number that is not too close
to 1 but also not too small, to avoid slow convergence or oscillation.
We shall give the details of the algorithm using the factors 2 and 1/2, for the cases where
we want to raise weights and lower weights, respectively. Start the Winnow Algorithm with
a weight vector w = [ w 1 , w 2 , . . . , w d ], all of whose components are 1, and let the threshold
θ equal d , the number of dimensions of the vectors in the training examples. Let ( x , y ) be
the next training example to be considered, where x = [ x 1 , x 2 , . . . , x d ].
(1) If w . x > θ and y = +1, or w . x < θ and y = −1, then the example is correctly classified,
so no change to w is made.
(2) If w . x θ , but y = +1, then the weights for the components where x has 1 are too low
as a group. Double each of the corresponding components of w . That is, if x i = 1 then
set w i := 2 w i .
(3) If w . x θ , but y = −1, then the weights for the components where x has 1 are too high
as a group. Halve each of the corresponding components of w . That is, if x i = 1 then
set w i := w i /2.
EXAMPLE 12.5 Let us reconsider the training data from Fig. 12.6 . Initialize w = [1, 1, 1, 1,
1] and let θ = 5. First, consider feature vector a = [1, 1, 0, 1, 1]. w . a = 4, which is less than
θ . Since the associated label for a is +1, this example is misclassified. When a +1-labeled
example is misclassified, we must double all the components where the example has 1; in
this case, all but the third component of a is 1. Thus, the new value of w is [2, 2, 1, 2, 2].
Next, we consider training example b = [0, 0, 1, 1, 0]. w . b = 3, which is less than θ .
However, the associated label for b is −1, so no change to w is needed.
For c = [0, 1, 1, 0, 0] we find w . c = 3 < θ , while the associated label is +1. Thus, we
double the components of w where the corresponding components of c are 1. These com-
ponents are the second and third, so the new value of w is [2, 4, 2, 2, 2].
Search WWH ::




Custom Search