Information Technology Reference
In-Depth Information
4. The delta-bar-delta rule [101], which considers a diagonal matrix instead of
the usual scalar learning rate parameter (assuming that the weights evolve
independently in time) and sets its diagonal elements as a function of the
sign of the local gradient correspondent component
5. The quickprop rule [55], which considers the weights as quasi-independent
and approximates the error surface by a quadratic polynomial
Other methods can be found in [21]. However, the batch techniques can better
accelerate the MC computation. For MCA, if the incoming inputs are collected in
blocks and are fed to the neuron, which changes its weights only after the entire
block presentation, all methods typical of batch learning can be used (here block
processing and batch processing are considered distinct because of the constancy
of the block in the second process): for example, the conjugate gradient methods
[59,66,85,86] (for its implementation as an acceleration method for MCA EXIN,
see [24]).
Remark 78 (Best Block Technique) Among the possible numerical techniques
to minimize the Rayleigh quotient, it is not practical to use the variable metric
( VM; also known as quasi-Newton ) and the Newton descent techniques because
the RQ Hessian matrix at the minimum is singular and then the inverse does
not exist. On the other hand, the conjugate gradient approach can deal with the
singular H r [ 85, pp. 256-259; 196 ] and can be used for accelerating MCA EXIN
in block mode [ 24 ] .
2.9 TLS HYPERPLANE FITTING
An important application of the MCA is hyperplane fitting [195] by means of the
orthogonal regression (TLS). Given a set of Nn -dimensional data points x ( i ) ,
the problem is to find a hyperplane model
T x + w 0 = 0
w 1 x 1 + w 2 x 2 +···+ w n x n + w 0 = w
(2.182)
such that the sum of the squared perpendicular distances of the model from the
data is minimized (total least squares criterion error E TLS ; see [98]). Hence,
w
T x ( i ) + w 0 2
w
N
T R w + 2 w 0 w
T e + w 0
= N w
E TLS =
(2.183)
T
w
w
T
w
i = 1
/ N ) i = 1 x ( i ) x ( i ) T are the mean vector
and the autocorrelation matrix of the data set. From dE TLS
/ N ) i = 1 x ( i ) and R = (
where e = (
1
1
/
d
w =
0 it follows
that the critical points of E TLS should satisfy
T R w +
T e + w 0
λ = w
2
w 0 w
R w + w 0 e λw = 0,
(2.184)
T
w
w
Search WWH ::




Custom Search