Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

The designed perceptron has a threshold of 2. It has weights 2 and 1 for “viagra” and

“nigeria” and smaller weights for “and” and “of.” It also has weight 1 for “the,” which sug-

gests that “the” is as indicative of spam as “nigeria,” something we doubt is true. Never-

theless, this perceptron does classify all examples correctly.

□

12.2.5

Multiclass Perceptrons

There are several ways in which the basic idea of the perceptron can be extended. We shall

discuss transformations that enable hyperplanes to serve for more complex boundaries in

the next section. Here, we look at how perceptrons can be used to classify data into many

classes.

Suppose we are given a training set with labels in k different classes. Start by training a

perceptron for each class; these perceptrons should each have the same threshold θ . That

is, for class i treat a training example ( x , i ) as a positive example, and all examples ( x , j ),

where j ≠ i , as a negative example. Suppose that the weight vector of the perceptron for

class i is determined to be w i after training.

Given a new vector x to classify, we compute w i . x for all i = 1, 2, . . . , k . We take the

class of x to be the value of i for which w i . x is the maximum, provided that value is at least

θ . Otherwise, x is assumed not to belong to any of the k classes.

For example, suppose we want to classify Web pages into a number of topics, such as

sports, politics, medicine, and so on. We can represent Web pages by a vector with 1 for

each word present in the page and 0 for words not present (of course we would only visual-

ize the pages that way; we wouldn't construct the vectors in reality). Each topic has certain

words that tend to indicate that topic. For instance, sports pages would be full of words

like “win,” “goal,” “played,” and so on. The weight vector for that topic would give higher

weights to the words that characterize that topic.

A new page could be classified as belonging to the topic that gives the highest score

when the dot product of the page's vector and the weight vectors for the topics are com-

puted. An alternative interpretation of the situation is to classify a page as belonging to all

those topics for which the dot product is above some threshold (presumably a threshold

higher than the θ used for training).

12.2.6

Transforming the Training Set

While a perceptron must use a linear function to separate two classes, it is always possible

to transform the vectors of a training set before applying a perceptron-based algorithm to

separate the classes. An example should give the basic idea.

Search WWH ::

Custom Search

Home