Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

there is a linear separator, or at least a hyperplane that approximately separates the classes.

However, we can separate points by a nonlinear boundary if we first transform the points

to make the separator be linear. The model is expressed by a vector, the normal to the sep-

arating hyperplane. Since this vector is often of very high dimension, it can be very hard to

interpret the model.

Nearest-Neighbor Classification and Regression Here, the model is the training set itself,

so we expect it to be intuitively understandable. The approach can deal with multidimen-

sional data, although the larger the number of dimensions, the sparser the training set will

be, and therefore the less likely it is that we shall find a training point very close to the point

we need to classify. That is, the “curse of dimensionality” makes nearest-neighbor meth-

ods questionable in high dimensions. These methods are really only useful for numerical

features, although one could allow categorical features with a small number of values. For

instance, a binary categorical feature like {male, female} could have the values replaced by

0 and 1, so there was no distance in this dimension between individuals of the same gender

and distance 1 between other pairs of individuals. However, three or more values cannot be

assigned numbers that are equidistant. Finally, nearest-neighbor methods have many para-

meters to set, including the distance measure we use (e.g., cosine or Euclidean), the number

of neighbors to choose, and the kernel function to use. Different choices result in different

classification, and in many cases it is not obvious which choices yield the best results.

Decision Trees We have not discussed this commonly used method in this chapter, al-

though it was introduced briefly in Section 9.2.7 . Unlike the methods of this chapter, de-

cision trees are useful for both categorical and numerical features. The models produced

are generally quite understandable, since each decision is represented by one node of the

tree. However, this approach is only useful for low-dimension feature vectors. The reason

is that building decision trees with many levels leads to overfitting, where below the top

levels, the decisions are based on peculiarities of small fractions of the training set, rather

than fundamental properties of the data. But if a decision tree has few levels, then it cannot

even mention more than a small number of features. As a result, the best use of decision

trees is often to create an ensemble of many, low-depth trees and combine their decision in

some way.

12.6 Summary of Chapter 12

✦ Training Sets : A training set consists of a feature vector, each component of which is a feature, and a label indicating

the class to which the object represented by the feature vector belongs. Features can be categorical - belonging to an

enumerated list of values - or numerical.

Search WWH ::

Custom Search

Home