Database Reference
In-Depth Information
there is a linear separator, or at least a hyperplane that approximately separates the classes.
However, we can separate points by a nonlinear boundary if we first transform the points
to make the separator be linear. The model is expressed by a vector, the normal to the sep-
arating hyperplane. Since this vector is often of very high dimension, it can be very hard to
interpret the model.
Nearest-Neighbor Classification and Regression Here, the model is the training set itself,
so we expect it to be intuitively understandable. The approach can deal with multidimen-
sional data, although the larger the number of dimensions, the sparser the training set will
be, and therefore the less likely it is that we shall find a training point very close to the point
we need to classify. That is, the “curse of dimensionality” makes nearest-neighbor meth-
ods questionable in high dimensions. These methods are really only useful for numerical
features, although one could allow categorical features with a small number of values. For
instance, a binary categorical feature like {male, female} could have the values replaced by
0 and 1, so there was no distance in this dimension between individuals of the same gender
and distance 1 between other pairs of individuals. However, three or more values cannot be
assigned numbers that are equidistant. Finally, nearest-neighbor methods have many para-
meters to set, including the distance measure we use (e.g., cosine or Euclidean), the number
of neighbors to choose, and the kernel function to use. Different choices result in different
classification, and in many cases it is not obvious which choices yield the best results.
Decision Trees We have not discussed this commonly used method in this chapter, al-
though it was introduced briefly in Section 9.2.7 . Unlike the methods of this chapter, de-
cision trees are useful for both categorical and numerical features. The models produced
are generally quite understandable, since each decision is represented by one node of the
tree. However, this approach is only useful for low-dimension feature vectors. The reason
is that building decision trees with many levels leads to overfitting, where below the top
levels, the decisions are based on peculiarities of small fractions of the training set, rather
than fundamental properties of the data. But if a decision tree has few levels, then it cannot
even mention more than a small number of features. As a result, the best use of decision
trees is often to create an ensemble of many, low-depth trees and combine their decision in
some way.
12.6 Summary of Chapter 12
Training Sets : A training set consists of a feature vector, each component of which is a feature, and a label indicating
the class to which the object represented by the feature vector belongs. Features can be categorical - belonging to an
enumerated list of values - or numerical.
Search WWH ::




Custom Search