Database Reference
In-Depth Information
The designed perceptron has a threshold of 2. It has weights 2 and 1 for “viagra” and
“nigeria” and smaller weights for “and” and “of.” It also has weight 1 for “the,” which sug-
gests that “the” is as indicative of spam as “nigeria,” something we doubt is true. Never-
theless, this perceptron does classify all examples correctly.
12.2.5
Multiclass Perceptrons
There are several ways in which the basic idea of the perceptron can be extended. We shall
discuss transformations that enable hyperplanes to serve for more complex boundaries in
the next section. Here, we look at how perceptrons can be used to classify data into many
classes.
Suppose we are given a training set with labels in k different classes. Start by training a
perceptron for each class; these perceptrons should each have the same threshold θ . That
is, for class i treat a training example ( x , i ) as a positive example, and all examples ( x , j ),
where j i , as a negative example. Suppose that the weight vector of the perceptron for
class i is determined to be w i after training.
Given a new vector x to classify, we compute w i . x for all i = 1, 2, . . . , k . We take the
class of x to be the value of i for which w i . x is the maximum, provided that value is at least
θ . Otherwise, x is assumed not to belong to any of the k classes.
For example, suppose we want to classify Web pages into a number of topics, such as
sports, politics, medicine, and so on. We can represent Web pages by a vector with 1 for
each word present in the page and 0 for words not present (of course we would only visual-
ize the pages that way; we wouldn't construct the vectors in reality). Each topic has certain
words that tend to indicate that topic. For instance, sports pages would be full of words
like “win,” “goal,” “played,” and so on. The weight vector for that topic would give higher
weights to the words that characterize that topic.
A new page could be classified as belonging to the topic that gives the highest score
when the dot product of the page's vector and the weight vectors for the topics are com-
puted. An alternative interpretation of the situation is to classify a page as belonging to all
those topics for which the dot product is above some threshold (presumably a threshold
higher than the θ used for training).
12.2.6
Transforming the Training Set
While a perceptron must use a linear function to separate two classes, it is always possible
to transform the vectors of a training set before applying a perceptron-based algorithm to
separate the classes. An example should give the basic idea.
Search WWH ::




Custom Search