Digital Signal Processing Reference
In-Depth Information
as an approximate realization of the function f ;thatis,
|
F
(
x 1 ,
...
, x m 0 )
f
(
x 1 ,
...
, x m 0 ) | <
ε
(7.30)
for all x 1 , x 2 ,
...
, x m 0 that lie in the input space.
Since a given network architecture with universal approximation prop-
erty is established, a fundamental question concerns the method to obtaining
its parameters, i.e., the synaptic weights of the network. The MLP can be
thought of as a nonlinear filter and, consequently, we may consider the
methodology we have used earlier: to choose a criterion and derive a suit-
able algorithm. In this sense, our classical procedure consists in obtaining
the gradient of the cost function with respect to the synaptic weights, in the
spirit of the steepest-descent algorithm, as seen in Section 3.4. This is valid,
but a difficulty emerges, as it is not straightforward to verify the influence
of the error signal built in the output layer on weights belonging to previous
layers. From there arises a specific method for adapting the parameters of an
MLP, which is presented next.
7.4.2.1 The Backpropagation Algorithm
The context in which the BPA is defined is that of supervised filtering. In
such scenario, we can count on having at hand a set of N samples available
samples of input stimuli together with the corresponding desired values. For
this data set, it is possible to build a cost function given by
N samples
d
) 2
1
N samples
J BP averaged =
(
n
)
y
(
n
(7.31)
n
=
1
where d
is the desired output. It is worth noting that we are deal-
ing directly with a time-average, similarly to the least-squares procedure
described in Section 3.5.1, without resorting to statistical expectations that
characterize the Wiener approach. The objective is to minimize the cost
function in (7.31) with respect to all weights, which is carried out by clev-
erly employing the chain rule in the process of differentiation. In order to
simplify such process, let us consider for a while the case of N samples =
(
n
)
1,
so that the cost function becomes the instantaneous quadratic error,
expressed by
) = d
) 2
e 2
J BP (
n
) =
(
n
(
n
)
y
(
n
(7.32)
In fact, Equation 7.32 is a stochastic approximation of the MSE, as
that used in the derivation of the LMS algorithm. In order to differentiate
J BP (
n
)
with respect to the weights of the hidden layer, we make use of the
 
Search WWH ::




Custom Search