Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

1.0

N=10

N=20

N=40

N=100

N = ∞

P( α )

0.5

0.0

0

1

2

3

4

α

Fig. 6.21. Probability of linear separation of M points in N dimensions

For N large, the probability of linear separation is almost 1 if M

≤

2 N ,

and drops abruptly to zero beyond M/N

∼

2. Hence, although we cannot

guarantee that any training set with M

2 N is linearly separable, this is

highly probable; the larger N and M , the larger the probability of linear

separation when α< 2. In the thermodynamic limit N

≤

,

with α ≡ M/N = constant, the typical capacity of the perceptron is α c =2.

Strictly speaking, α c indicates the transition between the regime where linear

separability has probability 1 and the regime where that probability vanishes,

in the thermodynamic limit. It is important to emphasize that the behavior

of P LS ( L M ) is already very close to the asymptotic thermodynamic limit for

values of N of the order of 100. That is why typical learning properties provide

very useful hints for systems with large, but finite, dimension N .

→∞

, M

→∞

6.8 Additional Theoretical Material

6.8.1 Bounds to the Number of Iterations of the Perceptron

Algorithm

In this section, we provide the computation of the bounds used to prove the

convergence of the perceptron algorithm. In order to obtain a lower bound

to the norm of the weight vector, we take into account the fact that w ∗

is

unitary:

w ∗ .

We denote by k ( t ) the label of the examples learnt at iteration t . After iteration

t , the weight vector w ( t + 1) can be written as

w ( t +1)

=

w ( t +1)

w ∗ ≥

w ( t +1)

·

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home