Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

12.2 Perceptrons

A perceptron is a linear binary classifier. Its input is a vector x = [ x 1 , x 2 , . . . , x d ] with real-

valued components. Associated with the perceptron is a vector of weights w = [ w 1 , w 2 , . . .

, w d ], also with real-valued components. Each perceptron has a threshold θ . The output of

the perceptron is +1 if w . x > θ , and the output is −1 if w . x < θ . The special case where w . x

= θ will always be regarded as “wrong,” in the sense that we shall describe in detail when

we get to Section 12.2.1 .

The weight vector w defines a hyperplane of dimension d − 1 - the set of all points x

such that w . x = θ , as suggested in Fig. 12.4 . Points on the positive side of the hyperplane

are classified +1 and those on the negative side are classified −1. A perceptron classifier

works only for data that is linearly separable , in the sense that there is some hyperplane

that separates all the positive points from all the negative points. If there are many such

hyperplanes, the perceptron will converge to one of them, and thus will correctly classify

all the training points. If no such hyperplane exists, then the perceptron cannot converge

to any particular one. In the next section, we discuss support-vector machines, which do

not have this limitation; they will converge to some separator that, although not a perfect

classifier, will do as well as possible under the metric to be described in Section 12.3 .

Figure 12.4 A perceptron divides a space by a hyperplane into two half-spaces

12.2.1

Training a Perceptron with Zero Threshold

To train a perceptron, we examine the training set and try to find a weight vector w and

threshold θ such that all the feature vectors with y = +1 (the positive examples ) are on the

positive side of the hyperplane and all those with y = −1 (the negative examples ) are on the

negative side. It may or may not be possible to do so, since there is no guarantee that any

hyperplane separates all the positive and negative examples in the training set.

We begin by assuming the threshold is 0; the simple augmentation needed to handle an

unknown threshold is discussed in Section 12.2.4 . The following method will converge to

some hyperplane that separates the positive and negative examples, provided one exists.

(1) Initialize the weight vector w to all 0's.

Search WWH ::

Custom Search

Home