Neural Networks: An Overview - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

In the section devoted to the definitions, we showed that the output of a

feedforward neural network with a layer of sigmoid activation functions (mul-

tilayer Perceptron) is nonlinear with respect to the parameters of the network,

whereas the output of a network of radial basis functions with fixed centers

and widths, or of wavelets with fixed translations and dilations, is linear with

respect to the parameters. Similarly, a polynomial is linear with respect to the

coe cients of the monomials. Thus, neurons with sigmoid activation functions

provide more parsimonious approximations than polynomials or radial basis

functions with fixed centers and widths, or wavelets with fixed translations

and dilations. Conversely, if the centers and widths of Gaussian radial basis

functions, or the centers and dilations of wavelets, are considered as adjustable

parameters, there is no mathematically proved advantage to any one of those

models over the others. However, some practical considerations may lead to

favor one of the models over the others: prior knowledge on the type of nonlin-

earity that is required, local vs. nonlocal function, ease and speed of training

(see Chap. 2, section “Parameter initialization”), ease of hardware integration

into silicon, etc.

The origin of parsimony can be understood qualitatively as follows. Con-

sider a model that is linear with respect to its parameters, such as a polynomial

model, e.g.,

g ( x )=4+2 x +4 x 2

0 . 5 x 3 .

−

The output g ( x ) of the model is a linear combination of functions y =1 ,y =

x,y = x 2 ,y = x 3 , with parameters (weights) w 0 =4 ,w 1 =2 ,w 2 =4 ,w 3 =

−

0 . 5. The shapes of those functions are fixed.

Consider a neural model as shown on Fig. 1.8, for which the equation is

g ( x )=0 . 5

−

2 tanh(10 x + 5) + 3 tanh( x +0 . 25)

−

2 tanh(3 x

−

0 . 25) .

This model is also a linear combination of functions ( y =1 ,y = tanh(10 x +

5) ,y = tanh( x +0 . 25) ,y = tanh(3 x

0 . 25)), but the shapes of these functions

depend on the values of the parameters of the connections between the inputs

and the hidden neurons. Thus, instead of combining functions whose shapes

are fixed, one combines functions whose shapes are adjustable through the pa-

rameters of some connections. That provides extra degrees of freedom, which

can be taken advantage of for using a smaller number of functions, hence a

smaller number of parameters. That is the very essence of parsimony.

−

1.1.3.3 An Elementary Example

Consider the function

y =16 , 71 x 2

−

0 , 075 .

We sample 20 equally spaced points that are used for training a multilayer

Perceptron with two hidden neurons whose nonlinearity is tan − 1 ,asshownon

Fig. 1.9(a). Training is performed with the Levenberg-Marquardt algorithm

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home