Nonlinear Filtering and Machine Learning - Unsupervised Signal Processing

Digital Signal Processing Reference

In-Depth Information

as an approximate realization of the function f ;thatis,

|

F

(

x 1 ,

...

, x m 0 ) −

f

(

x 1 ,

...

, x m 0 ) | <

ε

(7.30)

for all x 1 , x 2 ,

...

, x m 0 that lie in the input space.

Since a given network architecture with universal approximation prop-

erty is established, a fundamental question concerns the method to obtaining

its parameters, i.e., the synaptic weights of the network. The MLP can be

thought of as a nonlinear filter and, consequently, we may consider the

methodology we have used earlier: to choose a criterion and derive a suit-

able algorithm. In this sense, our classical procedure consists in obtaining

the gradient of the cost function with respect to the synaptic weights, in the

spirit of the steepest-descent algorithm, as seen in Section 3.4. This is valid,

but a difficulty emerges, as it is not straightforward to verify the influence

of the error signal built in the output layer on weights belonging to previous

layers. From there arises a specific method for adapting the parameters of an

MLP, which is presented next.

7.4.2.1 The Backpropagation Algorithm

The context in which the BPA is defined is that of supervised filtering. In

such scenario, we can count on having at hand a set of N samples available

samples of input stimuli together with the corresponding desired values. For

this data set, it is possible to build a cost function given by

N samples

d

) 2

1

N samples

J BP averaged =

(

n

) −

y

(

n

(7.31)

n

=

1

where d

is the desired output. It is worth noting that we are deal-

ing directly with a time-average, similarly to the least-squares procedure

described in Section 3.5.1, without resorting to statistical expectations that

characterize the Wiener approach. The objective is to minimize the cost

function in (7.31) with respect to all weights, which is carried out by clev-

erly employing the chain rule in the process of differentiation. In order to

simplify such process, let us consider for a while the case of N samples =

(

n

)

1,

so that the cost function becomes the instantaneous quadratic error,

expressed by

) = d

) 2

e 2

J BP (

n

) =

(

n

(

n

) −

y

(

n

(7.32)

In fact, Equation 7.32 is a stochastic approximation of the MSE, as

that used in the derivation of the LMS algorithm. In order to differentiate

J BP (

n

)

with respect to the weights of the hidden layer, we make use of the

Unsupervised Signal Processing

Search WWH ::

Custom Search

Home