Database Reference
In-Depth Information
labels are +1 or −1. Unlike the basic perceptron algorithm, which can produce positive or
negative components in the weight vector
w
, Winnow produces only positive weights.
The general Winnow Algorithm allows for a variety of parameters to be selected, and we
shall only consider one simple variant. However, all variants have in common the idea that
there is a positive threshold
θ
. If
w
is the current weight vector, and
x
is the feature vector
in the training set that we are currently considering, we compute
w
.
x
and compare it with
θ
. If the dot product is too low, and the class for
x
is +1, then we have to raise the weights
of
w
in those components where
x
has 1. We multiply these weights by a number greater
than 1. The larger this number, the greater the training rate, so we want to pick a number
that is not too close to 1 (or convergence will be too slow) but also not too large (or the
weight vector may oscillate). Similarly, if
w
.
x
≥
θ
, but the class of
x
is −1, then we want to
lower the weights of
w
in those components where
x
is 1. We multiply those weights by a
number greater than 0 but less than 1. Again, we want to pick a number that is not too close
to 1 but also not too small, to avoid slow convergence or oscillation.
We shall give the details of the algorithm using the factors 2 and 1/2, for the cases where
we want to raise weights and lower weights, respectively. Start the Winnow Algorithm with
a weight vector
w
= [
w
1
,
w
2
, . . . ,
w
d
], all of whose components are 1, and let the threshold
θ
equal
d
, the number of dimensions of the vectors in the training examples. Let (
x
,
y
) be
the next training example to be considered, where
x
= [
x
1
,
x
2
, . . . ,
x
d
].
(1) If
w
.
x
>
θ
and
y
= +1, or
w
.
x
<
θ
and
y
= −1, then the example is correctly classified,
so no change to
w
is made.
(2) If
w
.
x
≤
θ
, but
y
= +1, then the weights for the components where
x
has 1 are too low
as a group. Double each of the corresponding components of
w
. That is, if
x
i
= 1 then
set
w
i
:= 2
w
i
.
(3) If
w
.
x
≥
θ
, but
y
= −1, then the weights for the components where
x
has 1 are too high
as a group. Halve each of the corresponding components of
w
. That is, if
x
i
= 1 then
set
w
i
:=
w
i
/2.
1] and let
θ
= 5. First, consider feature vector
a
= [1, 1, 0, 1, 1].
w
.
a
= 4, which is less than
θ
. Since the associated label for
a
is +1, this example is misclassified. When a +1-labeled
example is misclassified, we must double all the components where the example has 1; in
this case, all but the third component of
a
is 1. Thus, the new value of
w
is [2, 2, 1, 2, 2].
Next, we consider training example
b
= [0, 0, 1, 1, 0].
w
.
b
= 3, which is less than
θ
.
However, the associated label for
b
is −1, so no change to
w
is needed.
For
c
= [0, 1, 1, 0, 0] we find
w
.
c
= 3 <
θ
, while the associated label is +1. Thus, we
double the components of
w
where the corresponding components of
c
are 1. These com-
ponents are the second and third, so the new value of
w
is [2, 4, 2, 2, 2].