Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

Figure 12.11 A training set may not allow the existence of any separating hyperplane

One might argue that, based on the observations of Section 12.2.6 , it should be possible

to find some function on the points that would transform them to another space where they

were linearly separable. That might be the case, but if so, it would probably be an example

of overfitting , the situation where the classifier works very well on the training set, because

it has been carefully designed to handle each training example correctly. However, because

the classifier is exploiting details of the training set that do not apply to other examples that

must be classified in the future, the classifier will not perform well on new data.

Another problem is illustrated in Fig. 12.12 . Usually, if classes can be separated by

one hyperplane, then there are many different hyperplanes that will separate the points.

However, not all hyperplanes are equally good. For instance, if we choose the hyperplane

that is furthest clockwise, then the point indicated by “?” will be classified as a circle, even

though we intuitively see it as closer to the squares. When we meet support-vector ma-

chines in Section 12.3 , we shall see that there is a way to insist that the hyperplane chosen

be the one that in a sense divides the space most fairly.

Figure 12.12 Generally, more that one hyperplane can separate the classes if they can be separated at all

Yet another problem is illustrated by Fig. 12.13 . Most rules for training a perceptron stop

as soon as there are no misclassified points. As a result, the chosen hyperplane will be one

that just manages to classify some of the points correctly. For instance, the upper line in

Fig. 12.13 has just managed to accommodate two of the squares, and the lower line has

just managed to accommodate one of the circles. If either of these lines represent the final

weight vector, then the weights are biased toward one of the classes. That is, they correctly

classify the points in the training set, but the upper line would classify new squares that are

just below it as circles, while the lower line would classify circles just above it as squares.

Again, a more equitable choice of separating hyperplane will be shown in Section 12.3 .

Search WWH ::

Custom Search

Home