Database Reference
In-Depth Information
geria” are indicative of spam, while “of” is indicative of nonspam. It considers “and” and
“the” neutral,” although we would probably prefer to give “and,” “or,” and “the” the same
weight.
12.2.2
Convergence of Perceptrons
As we mentioned at the beginning of this section, if the data points are linearly separable,
then the perceptron algorithm will converge to a separator. However, if the data is not lin-
early separable, then the algorithm will eventually repeat a weight vector and loop infin-
itely. Unfortunately, it is often hard to tell, during the running of the algorithm, which of
these two cases applies. When the data is large, it is not feasible to remember all previous
weight vectors to see whether we are repeating a vector, and even if we could, the period
of repetition would most likely be so large that we would want to terminate the algorithm
long before we repeated.
A second issue regarding termination is that even if the training data is linearly separ-
able, the entire dataset might not be linearly separable. The consequence is that there might
not be any value in running the algorithm for a very large number of rounds in the hope
of converging to a separator. We therefore need a strategy for deciding when to terminate
the perceptron algorithm, assuming convergence has not occurred. Here are some common
tests for termination.
(1) Terminate after a fixed number of rounds.
(2) Terminate when the number of misclassified training points stops changing.
(3) Withhold a test set from the training data, and after each round, run the perceptron on
the test data. Terminate the algorithm when the number of errors on the test set stops
changing.
Another technique that will aid convergence is to lower the training rate as the number
of rounds increases. For example, we could allow the training rate η to start at some initial
η 0 and lower it to η 0 /(1 + ct ) after the t th round, where c is some small constant.
12.2.3
The Winnow Algorithm
There are many other rules one could use to adjust weights for a perceptron. Not all pos-
sible algorithms are guaranteed to converge, even if there is a hyperplane separating pos-
itive and negative examples. One that does converge is called Winnow , and that rule will
be described here. Winnow assumes that the feature vectors consist of 0's and 1's, and the
Search WWH ::




Custom Search