Digital Signal Processing Reference
In-Depth Information
of the input space can render features irrelevant.
Firstweassumethatlayers3and4havebeentrainedsothat
they comprise a model of the pattern-generating process, and B is the
identity matrix. Then the coe cients B nr can be adapted by gradient
descent with the relevance ρ n of the transformed feature x n as the target
function. Modifying B nr means changing the relevance of x n by adding
x r to it with some weight B nr . This can be done online, that is, for every
training vector x p , without storing the whole training set. The diagonal
elements B nn are constrained to be constant 1, because a feature must
not be rendered irrelevant by scaling itself. This in turn guarantees that
no information will be lost. B nr will be adapted only under the condition
that ρ n p , so that the relevance of a feature can be decreased only by
some more relevant feature. The coecients are adapted by the learning
rule:
∂ρ n
∂B nr
B new
nr
= B old
nr
μ
(6.35)
with the learning rate μ and the partial derivative
PJ
p
( x pn
m jn )
σ jn
∂ρ n
∂B nr
1
( x pr
m jr ) .
=
(6.36)
j
In the learning procedure, which is based on, for example, [151], we
minimize, according to the LMS criterion, the target function
P
1
2
2 .
E =
|
y ( x )
Φ( x )
|
(6.37)
p=0
where P is the size of the training set. The neural network has some
useful features, such as automatic allocation of neurons, discarding of
degenerated and inactive neurons, and variation of the learning rate
depending on the number of allocated neurons.
The relevance of a feature is optimized by gradient descent:
η ∂E
∂ρ i
ρ new
i
= ρ old
i
(6.38)
Based on the new introduced relevance measure and the change in
the architecture, we get the following correction equations for the neural
Search WWH ::




Custom Search