Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

geria” are indicative of spam, while “of” is indicative of nonspam. It considers “and” and

“the” neutral,” although we would probably prefer to give “and,” “or,” and “the” the same

weight.

□

12.2.2

Convergence of Perceptrons

As we mentioned at the beginning of this section, if the data points are linearly separable,

then the perceptron algorithm will converge to a separator. However, if the data is not lin-

early separable, then the algorithm will eventually repeat a weight vector and loop infin-

itely. Unfortunately, it is often hard to tell, during the running of the algorithm, which of

these two cases applies. When the data is large, it is not feasible to remember all previous

weight vectors to see whether we are repeating a vector, and even if we could, the period

of repetition would most likely be so large that we would want to terminate the algorithm

long before we repeated.

A second issue regarding termination is that even if the training data is linearly separ-

able, the entire dataset might not be linearly separable. The consequence is that there might

not be any value in running the algorithm for a very large number of rounds in the hope

of converging to a separator. We therefore need a strategy for deciding when to terminate

the perceptron algorithm, assuming convergence has not occurred. Here are some common

tests for termination.

(1) Terminate after a fixed number of rounds.

(2) Terminate when the number of misclassified training points stops changing.

(3) Withhold a test set from the training data, and after each round, run the perceptron on

the test data. Terminate the algorithm when the number of errors on the test set stops

changing.

Another technique that will aid convergence is to lower the training rate as the number

of rounds increases. For example, we could allow the training rate η to start at some initial

η 0 and lower it to η 0 /(1 + ct ) after the t th round, where c is some small constant.

12.2.3

The Winnow Algorithm

There are many other rules one could use to adjust weights for a perceptron. Not all pos-

sible algorithms are guaranteed to converge, even if there is a hyperplane separating pos-

itive and negative examples. One that does converge is called Winnow , and that rule will

be described here. Winnow assumes that the feature vectors consist of 0's and 1's, and the

Search WWH ::

Custom Search

Home