Neural computing in pharmaceutical products and process development - Computer-Aided Applications in Pharmaceutical Technology

Information Technology Reference

In-Depth Information

[5.7]

f ʹ are derivative functions of the activation functions, therefore the BP

algorithm can only be applied to networks with differentiable activation

functions.

Note that when the BP algorithm is applied, the signal is transferred

backwards - from the output to hidden layers, in contrast to feed-forward

transition of signals (from input to output via hidden layers). The most

frequently used ANN is the multi-layered perceptron feed-forward

network with the BP learning algorithm. The main problem in the process

of ANNs training is the possibility that the network will get stuck in the

local minima, and several techniques have been developed to resolve this

issue (Erb, 1993).

Various other techniques, apart from the BP algorithm, are available

for the weights adjustment during the training process (i.e. error

minimization), such as steepest descent method, conjugate gradient

method, simulated annealing algorithm, Newton's method, Gauss-

Newton method, etc. (Peh et al., 2000; Reis et al., 2004). The conjugate

gradient method is often combined with the simulated annealing

algorithm to attain a global minimum among local minima (Peh et al.,

2000). Genetic algorithms (GAs) can also be used to search for the global

error minimum.

It is often diffi cult to determine the optimal number of units (neurons)

in the hidden layer(s). The number of neurons in the input layer

corresponds to the number of independent variables, whereas the number

of neurons in the output layer equals the number of dependent variables.

The number of neurons in hidden layers depends on many factors, such

as complexity of the problem studied, the number of training data, type

of activation function, training algorithm, required prediction accuracy,

etc. (Sun et al., 2003). An approach was introduced by Carpenter and

Hoffman (1995), relating the number of units in the input, hidden, and

output layers:

n s = β[ n h ( n i + 1) + n o ( n h + 1)]

[5.8]

where n h is the number of hidden units, n i is the number of input units, n o

is the number of output units, and n s is the number of training data pairs.

The constant β is the parameter relating to the degree of over- determination

(Takayama et al., 2003). When the values of β are greater than 1,

reasonable predictions can be expected, since the number of training data

pairs is greater than the number of weights that need to be adjusted in the

neural network.

Search WWH ::

Custom Search

Home