Database Reference
In-Depth Information
Test Sets and Overfitting : When training some classifier on a training set, it is useful to remove some of the training
set and use the removed data as a test set. After producing a model or classifier without using the test set, we can run
the classifier on the test set to see how well it does. If the classifier does not perform as well on the test set as on the
training set used, then we have overfit the training set by conforming to peculiarities of the training-set data which
is not present in the data as a whole.
Batch Versus On-Line Learning : In batch learning, the training set is available at any time and can be used in re-
peated passes. On-line learning uses a stream of training examples, each of which can be used only once.
Perceptrons : This machine-learning method assumes the training set has only two class labels, positive and negative.
Perceptrons work when there is a hyperplane that separates the feature vectors of the positive examples from those
of the negative examples. We converge to that hyperplane by adjusting our estimate of the hyperplane by a fraction
- the learning rate - of the direction that is the average of the currently misclassified points.
The Winnow Algorithm : This algorithm is a variant of the perceptron algorithm that requires components of the fea-
ture vectors to be 0 or 1. Training examples are examined in a round-robin fashion, and if the current classification
of a training example is incorrect, the components of the estimated separator where the feature vector has 1 are ad-
justed up or down, in the direction that will make it more likely this training example is correctly classified in the
next round.
Nonlinear Separators : When the training points do not have a linear function that separates two classes, it may still
be possible to use a perceptron to classify them. We must find a function we can use to transform the points so that
in the transformed space, the separator is a hyperplane.
Support-Vector Machines : The SVM improves upon perceptrons by finding a separating hyperplane that not only
separates the positive and negative points, but does so in a way that maximizes the margin - the distance perpen-
dicular to the hyperplane to the nearest points. The points that lie exactly at this minimum distance are the support
vectors. Alternatively, the SVM can be designed to allow points that are too close to the hyperplane, or even on the
wrong side of the hyperplane, but minimize the error due to such misplaced points.
Solving the SVM Equations : We can set up a function of the vector that is normal to the hyperplane, the length of the
vector (which determines the margin), and the penalty for points on the wrong side of the margins. The regulariza-
tion parameter determines the relative importance of a wide margin and a small penalty. The equations can be solved
by several methods, including gradient descent and quadratic programming.
Nearest-Neighbor Learning : In this approach to machine learning, the entire training set is used as the model. For
each (“query”) point to be classified, we search for its k nearest neighbors in the training set. The classification of
the query point is some function of the labels of these k neighbors. The simplest case is when k = 1, in which case
we can take the label of the query point to be the label of the nearest neighbor.
Regression : A common case of nearest-neighbor learning, called regression, occurs when the there is only one fea-
ture vector, and it, as well as the label, are real numbers; i.e., the data defines a real-valued function of one variable.
To estimate the label, i.e., the value of the function, for an unlabeled data point, we can perform some computation
involving the k nearest neighbors. Examples include averaging the neighbors or taking a weighted average, where
the weight of a neighbor is some decreasing function of its distance from the point whose label we are trying to
determine.
12.7 References for Chapter 12
The perceptron was introduced in [ 11 ] . [ 7 ] introduces the idea of maximizing the margin
around the separating hyperplane. A well-known topic on the subject is [ 10 ] .
The Winnow algorithm is from [ 9 ]. Also see the analysis in [ 1 ] .
Search WWH ::




Custom Search