Database Reference
In-Depth Information
12.2 Perceptrons
A perceptron is a linear binary classifier. Its input is a vector x = [ x 1 , x 2 , . . . , x d ] with real-
valued components. Associated with the perceptron is a vector of weights w = [ w 1 , w 2 , . . .
, w d ], also with real-valued components. Each perceptron has a threshold θ . The output of
the perceptron is +1 if w . x > θ , and the output is −1 if w . x < θ . The special case where w . x
= θ will always be regarded as “wrong,” in the sense that we shall describe in detail when
we get to Section 12.2.1 .
The weight vector w defines a hyperplane of dimension d − 1 - the set of all points x
such that w . x = θ , as suggested in Fig. 12.4 . Points on the positive side of the hyperplane
are classified +1 and those on the negative side are classified −1. A perceptron classifier
works only for data that is linearly separable , in the sense that there is some hyperplane
that separates all the positive points from all the negative points. If there are many such
hyperplanes, the perceptron will converge to one of them, and thus will correctly classify
all the training points. If no such hyperplane exists, then the perceptron cannot converge
to any particular one. In the next section, we discuss support-vector machines, which do
not have this limitation; they will converge to some separator that, although not a perfect
classifier, will do as well as possible under the metric to be described in Section 12.3 .
Figure 12.4 A perceptron divides a space by a hyperplane into two half-spaces
12.2.1
Training a Perceptron with Zero Threshold
To train a perceptron, we examine the training set and try to find a weight vector w and
threshold θ such that all the feature vectors with y = +1 (the positive examples ) are on the
positive side of the hyperplane and all those with y = −1 (the negative examples ) are on the
negative side. It may or may not be possible to do so, since there is no guarantee that any
hyperplane separates all the positive and negative examples in the training set.
We begin by assuming the threshold is 0; the simple augmentation needed to handle an
unknown threshold is discussed in Section 12.2.4 . The following method will converge to
some hyperplane that separates the positive and negative examples, provided one exists.
(1) Initialize the weight vector w to all 0's.
Search WWH ::




Custom Search