Information Technology Reference
In-Depth Information
In the section devoted to the definitions, we showed that the output of a
feedforward neural network with a layer of sigmoid activation functions (mul-
tilayer Perceptron) is nonlinear with respect to the parameters of the network,
whereas the output of a network of radial basis functions with fixed centers
and widths, or of wavelets with fixed translations and dilations, is linear with
respect to the parameters. Similarly, a polynomial is linear with respect to the
coe cients of the monomials. Thus, neurons with sigmoid activation functions
provide more parsimonious approximations than polynomials or radial basis
functions with fixed centers and widths, or wavelets with fixed translations
and dilations. Conversely, if the centers and widths of Gaussian radial basis
functions, or the centers and dilations of wavelets, are considered as adjustable
parameters, there is no mathematically proved advantage to any one of those
models over the others. However, some practical considerations may lead to
favor one of the models over the others: prior knowledge on the type of nonlin-
earity that is required, local vs. nonlocal function, ease and speed of training
(see Chap. 2, section “Parameter initialization”), ease of hardware integration
into silicon, etc.
The origin of parsimony can be understood qualitatively as follows. Con-
sider a model that is linear with respect to its parameters, such as a polynomial
model, e.g.,
g ( x )=4+2 x +4 x 2
0 . 5 x 3 .
The output g ( x ) of the model is a linear combination of functions y =1 ,y =
x,y = x 2 ,y = x 3 , with parameters (weights) w 0 =4 ,w 1 =2 ,w 2 =4 ,w 3 =
0 . 5. The shapes of those functions are fixed.
Consider a neural model as shown on Fig. 1.8, for which the equation is
g ( x )=0 . 5
2 tanh(10 x + 5) + 3 tanh( x +0 . 25)
2 tanh(3 x
0 . 25) .
This model is also a linear combination of functions ( y =1 ,y = tanh(10 x +
5) ,y = tanh( x +0 . 25) ,y = tanh(3 x
0 . 25)), but the shapes of these functions
depend on the values of the parameters of the connections between the inputs
and the hidden neurons. Thus, instead of combining functions whose shapes
are fixed, one combines functions whose shapes are adjustable through the pa-
rameters of some connections. That provides extra degrees of freedom, which
can be taken advantage of for using a smaller number of functions, hence a
smaller number of parameters. That is the very essence of parsimony.
1.1.3.3 An Elementary Example
Consider the function
y =16 , 71 x 2
0 , 075 .
We sample 20 equally spaced points that are used for training a multilayer
Perceptron with two hidden neurons whose nonlinearity is tan 1 ,asshownon
Fig. 1.9(a). Training is performed with the Levenberg-Marquardt algorithm
Search WWH ::




Custom Search