Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

w . a = 0. Since 0 is not positive, we move w in the direction of a by performing w := w +

(1/2)(+1) a . The new value of w is thus

Pragmatics of Training on Emails

When we represent emails or other large documents as training examples, we would not really want to construct the

vector of 0's and 1's with a component for every word that appears even once in the collection of emails. Doing so

would typically give us sparse vectors with millions of components. Rather, create a table in which all the words ap-

pearing in the emails are assigned integers 1, 2, . . ., indicating their component. When we process an email in the

training set, make a list of the components in which the vector has 1; i.e., use the standard sparse representation for

the vector. If we eliminate stop words from the representation, or even eliminate words with a low TF.IDF score, then

we make the vectors representing emails significantly sparser and thus compress the data even more. Only the vector

w needs to have all its components listed, since it will not be sparse after a small number of training examples have

been processed.

Next, consider

Since the associated y for b is −1, b is misclassified. We

thus assign

Training example c is next. We compute

Since the associated y for c is +1, c is also misclassified. We thus assign

Training example d is next to be considered:

Since the associated y for d is −1, d is misclassified as well. We thus assign

For training example e we compute Since the associated y for e is +1, e is

classified correctly, and no change to w is made. Similarly, for f we compute

so f is correctly classified. If we check a through d , we find that this w correctly classifies

them as well. Thus, we have converged to a perceptron that classifies all the training set

examples correctly. It also makes a certain amount of sense: it says that “viagra” and “ni-

Search WWH ::

Custom Search

Home