Information Technology Reference
In-Depth Information
where superscript T denotes the transpose of a matrix. Then a linear separator
exists if and only if there exists a vector w such that
M w > 0 ,
or, equivalently, if there exists a vector y > 0andavector w such that
M w = y .
Then one has w = M y ,where M is the pseudo-inverse of matrix M :
M = M T ( MM T ) 1 , which can be computed by the Choleski method [Press
1992].
The Ho and Kashyap algorithm is as follows:
Initialization (iteration 0):
w (0) = M y (0) where y (0) is an arbitrary positive vector
Iteration i
α ( i )= M w ( i )
y ( i )
y ( i +1)= y ( i )+ ρ ( α ( i )+
)where ρ is a positive scalar smaller than
1 w ( i +1)= w ( i )+ ρM ( α ( i )+
|
α ( i )
|
)
If one of the components of y ( i ) < 0 then the examples are not linearly
separable.
If all components of M w ( i ) > 0 then the examples are linearly separable
and w ( i ) is a solution.
|
α ( i )
|
The algorithm converges after a finite number of iterations.
References
1. Antoniadis A., Berruyer J., Carmona R. [1992], Regression non lineaire et ap-
plications , Economica
2. Barron A. [1993], Universal approximation bounds for superposition of a sig-
moidal function, IEEE Transactions on Information Theory , 39, pp 930-945
3. Baum E.B., Wilczek F. [1988], Supervised learning of probability distributions
by neural networks, Neural Information Processing Systems , pp 52-61
4. Benveniste A., Juditsky A., Delyon B., Zhang Q., Glorennec P.-Y. [1994],
Wavelets in identification, 10th IFAC Symposium on Identification , Copenhague
5. Bishop C. [1995], Neural networks for pattern recognition , Oxford University
Press
6. Bridle J.S. [1990], Probabilistic interpretation of feedforward classification net-
work outputs, with relationship to statistical pattern recognition, Neurocomput-
ing: algorithms, architectures and applications , pp 227-236 Springer
7. Broomhead D.S., Lowe D. [1988], Multivariable functional interpolation and
adaptive networks, Complex Systems , 2, pp 321-355
Search WWH ::




Custom Search