Database Reference
In-Depth Information
Figure 12.13 Perceptrons converge as soon as the separating hyperplane reaches the region between classes
Perceptrons on Streaming Data
While we have viewed the training set as stored data, available for repeated use on any number of passes, Perceptrons
can also be used in a stream setting. That is, we may suppose there is an infinite sequence of training examples, but
that each may be used only once. Detecting email spam is a good example of a training stream. Users report spam
emails and also report emails that were classified as spam but are not. Each email, as it arrives, is treated as a training
example, and modifies the current weight vector, presumably by a very small amount.
If the training set is a stream, we never really converge, and in fact the data points may well not be linearly sep-
arable. However, at all times, we have an approximation to the best possible separator. Moreover, if the examples in
the stream evolve over time, as would be the case for email spam, then we have an approximation that values recent
examples more than examples from the distant past, much like the exponentially decaying windows technique from
Section 4.7 .
12.2.8
Parallel Implementation of Perceptrons
The training of a perceptron is an inherently sequential process. If the number of dimen-
sions of the vectors involved is huge, then we might obtain some parallelism by computing
dot products in parallel. However, as we discussed in connection with Example 12.4 , high-
dimensional vectors are likely to be sparse and can be represented more succinctly than
would be expected from their length.
In order to get significant parallelism, we have to modify the perceptron algorithm
slightly, so that many training examples are used with the same estimated weight vector w .
As an example, let us formulate the parallel algorithm as a MapReduce job.
The Map Function Each Map task is given a chunk of training examples, and each Map
task knows the current weight vector w . The Map task computes w . x for each feature vec-
tor x = [ x 1 , x 2 , . . . , x k ] in its chunk and compares that dot product with the label y , which
is +1 or −1, associated with x . If the signs agree, no key-value pairs are produced for this
training example. However, if the signs disagree, then for each nonzero component x i of x
the key-value pair ( i , ηyx i ) is produced; here, η is the learning-rate constant used to train
this perceptron. Notice that cyx i is the increment we would like to add to the current i th
component of w , and if x i = 0, then there is no need to produce a key-value pair. However,
Search WWH ::




Custom Search