Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Notice that if the components x i of the input patterns are real-valued, the

probability that n + 1 points be on a same hyperplane is zero. In the case of

binary input components, that probability does not vanish. In the following

we restrict to points in general position.

As mentioned above, if m

n , all dichotomies of m points in general

position are linearly separable. The following computation is interesting for

m>n .Wedenoteby D ( m,n ) the number of dichotomies of L m generated by

the hyperplanes in dimension n . For points in general position, that number

depends on m and n only. We have

≤

D ( m, 1) = 2; D (1 ,n ) = 2

for all m,n

since there are two ways of separating (by assigning classes ± 1) m points

in 1 dimension, and there are also two ways of assigning a class to a single

point in dimension n , with a hyperplane containing the origin. Consider now a

training set that includes the patterns of L m and a new point x m +1 , L m +1 =

L m ∪

x m +1 . It may happen that two hyperplanes that generated the same

dichotomy in L m assign two different classes to x m +1 . In that case, there exists

a hyperplane H 0 , containing x m +1 , such that it generates the same dichotomy

on the patterns of L m . H 0 is ambiguous with respect to x m +1 . Then, H 0

generates a dichotomy of L m

in the space of dimension n

−

1 orthogonal to

H 0 . There is a one-to-one correspondence between the D ( m,n

−

1) dichotomies

in the space of dimension n

1 and the ambiguous dichotomies of the new

point in the n -dimensional space. Since there exist D ( m,n ) dichotomies of

L m , and each one induces two dichotomies of L m +1 , we obtain the following

recurrence [Cover 1965]:

−

D ( m +1 ,n )= D ( m,n )+ D ( m,n

−

1)

from which the expression of D ( M,N ) is obtained.

References

1. Anlauf J.K., Biehl, M. [1989], The AdaTron: An adaptive perceptron algorithm,

Europhys. Lett. 10, pp 687-692

2. Baum E.B., Haussler D. [1989], What size net gives valid generalization?, Neural

Computation 1, pp 151-160

3. Blake, C.L., Merz C.J. [1998], UCI Repository of machine learning databases,

available from http://www.ics.uci.edu/mlearn/MLRepository.html

4. Buhot A., Gordon M.B. [1997], Cost function and pattern distribution of the

Bayesian perceptron, Phys. Lett. A 228, pp 73-78

5. Buhot A., Torres Moreno J.M., Gordon M.B. [1997], Finite size scaling of the

Bayesian perceptron, Phys. Rev. E 55, pp 7434-7440

6. Buhot A., Torres Moreno J.M., Gordon M.B. [1997], Numerical simulations of

an optimal algorithm for supervised learning, European Symposium on Artificial

Neural Networks , Proceedings, M. Verleysen ed., pp 151-156

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home