Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

6.4.2 Convergence Theorem for the Perceptron Algorithm

Theorem. If the examples of the training set are linearly separable, the per-

ceptron algorithm finds a solution in a finite number of iterations.

In order to prove the theorem, we assume that the weights are initialized

to zero, i.e., using the option of tabula rasa . Strictly, that assumption is not

necessary. It just makes the proof easier than starting with arbitrary weight

values.

Since the examples of the training set L M are assumed to be linearly sepa-

rable, there exists a weight vector w ∗ , hereinafter called reference perceptron ,

that classifies correctly the examples. Without any loss of generality, we as-

sume that w ∗ is unitary. Otherwise, it would su ce to divide its components

by its norm. The stabilities of the examples in L M with respect to the refer-

ence hyperplane are all positive. Since w ∗ is unitary, they are identical to the

corresponding aligned fields,

γ ∗

= y k x k

w ∗ = z ∗

·

.

To prove the theorem, we derive upper and lower bounds to the norm of

the weight vector generated by the algorithm. Those bounds (see the Sect.

6.8 “Additional material” at the end of the chapter) are increasing functions

of the number of iterations t , but they increase at different rates. The lower

bound increases linearly with the n u mber of iterations t , whereas the upper

bound increases more slowly, as √ t (see Fig. 6.8). The bounds cross each

other, which is absurd, after a number of iterations T given by

T =

2

x max

,

γ min

∗

x max

where

is the norm of the example of L M

with the largest norm, and

γ min

∗

is the stability of the example of L M that has the smallest stability

with respect to the reference perceptron hyperplane. Thus, the perceptron

algorithm must converge, since the number of iterations cannot be larger

than T . Notice that the learning time may be very long, especially if there

are examples very close to the reference hyperplane ( γ min

∗

small relative to

x max

). However, the algorithm may converge in times much smaller than

that given by the above relation for two reasons:

•

first, because the reference hyperplane w ∗

is arbitrary, and the value of

γ min

∗

may be particularly small;

•

on the other hand, the learning time is a random variable that depends on

the particular sequence of examples selected for the successive updates.

Remark 1. The expression of the number of iterations T has a simple intuitive

interpretation. The correction to the weights performed at each iteration is

bounded, because its norm cannot be larger than the norm of the example

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home