Environmental Engineering Reference
In-Depth Information
function) that individually handles pieces of complex problems;
the weighted links between neurons determine the direction
of data flow and the contribution of the ''from'' neuron to
the ''to'' neuron. These weights can be determined through an
iterative training process that learns from known samples and
adjusts the weights between neurons until the minimum error of
the performance function is achieved. While the MLP structure
and the concept of back-propagation algorithm are relatively
simple, network topology and training parameter settings can
complicate the overall performance of neural networks for image
classification (see Foody and Arora, 1997; Kavzoglu and Mather,
2003; Mas and Flores, 2008).
to reduce the data dimensionality and hence the number of
neurons in the input layer (e.g., Benediktsson and Sveinsson,
1997; Liu and Lathrop, 2002). While the number of output
neurons can be defined according to the research objective in a
specific application, a challenging issue is to choose the number
of neurons in hidden layers. If there are too few neurons in
hidden layers, the network may be unable to approximate very
complex functions because of insufficient degrees of freedom. On
the other hand, if there are too many neurons, the network tends
to have a large number of degrees of freedom which may lead
to overtraining and hence poor performance in generalization
(Rojas, 1996). Thus, it is crucial to find the 'optimum' number of
neurons in hidden layers that adequately capture the relationship
in the training data. This optimization can be achieved by using
trial and error or several systematic approaches such as pruning
and constructive algorithms (Reed, 1993).
7.2.2 Network topology
The topology of a neural network is critical for neural computing
to solve problems with reasonable training time and satisfactory
performance. For the MLP networks, the topology is determined
by the number of hidden layers, neurons and connections, and
the type of activation function.
7.2.2.2 Activation function
Activation function is an algorithm that transforms the weighted
sumof inputs and produces outputs of a neuron. Duda, Hart and
Stork (2001) suggested a non-linear activation function should
be used to deal with non-linear relationships; otherwise, neural
networks would provide no computational power above linear
classifiers. The commonly-used non-linear activation functions
include the logistic sigmoid (log-sig) function (Equation 7.1) and
tangent sigmoid (tan-sig) function (Equation 7.2).
7.2.2.1 Number of hidden layers,
neurons and connections
The complexity of neural networks is largely defined by the
total number of input features, the number of output classes,
and the number of hidden layers and neurons (Duda, Hart and
Stork, 2001). While the first two factors are less flexible for
a particular classification problem, we can adjust the last two
to find an appropriate size of neural networks. A trade-off is
needed to balance the processing purpose of the hidden lay-
ers and the training time needed. Large neural networks with
more hidden layers may be more effective to represent non-
linear complex relationships, but usually demand more training
samples and longer training time. And larger neural networks
are more likely to get caught in undesirable local minima or
overly fit the training data. On the other hand, compact neural
networks may overly simplify the phenomena and lead to unsat-
isfactory results although they may be easier to train. Several
prior studies investigated the issue of the optimal number of
hidden layers, but their recommendations are not consistent. For
example, Shupe and Marsh (2004) recommended single-hidden-
layer networks while Civco (1993) preferred two-hidden-layer
networks. Kanellopoulos and Wilkinson (1997) suggested that
single-hidden-layer networks are suitable for most classification
problems but two-hidden-layer networks may be more appro-
priate for those applications with more than 20 output classes.
Some other researchers concluded that the number of hidden
layers may not have a significant influence upon classification
accuracies (e.g., Foody and Arora, 1997; Kavzoglu and Mather,
2003).
The number of neurons for the input, hidden, and output
layers determines the number of weighted links, particularly for
a fully connected network. Since the weight for each link is
determined through neural training, more links tend to increase
the training time. Thus, every effort should be made to minimize
unnecessary neurons in order to improve the efficiency in neural
training. Various feature extraction techniques, such as principal
component analysis and discriminant analysis, have been used
1
f 1 ( x ) =
(7.1)
e (1 a ) x
1
+
e (1 a ) x
e (1 a ) x
f 2 ( x ) =
(7.2)
e (1 a ) x
+ e (1 a ) x
where f 1 ( x ) refers to the log-sig activation function; f 2 ( x )isthe
tan-sig activation function; x refers to the input data; and a is the
user-defined training threshold.
Several researchers examined different activation functions
and concluded that the tan-sig function performed better (e.g.,
Ozkan and Erbek, 2002; Shupe and Marsh, 2004). However,
little research has been conducted to investigate how the training
threshold could affect the performance of different activation
functions. Based on Equations 7.1 and 7.2, the training threshold
determines the size of the contribution of the input data to the
output of the neuron. That is, it defines the slope of activation
functions in their midrange (Fig. 7.2). Therefore, the same type
of activation function with various training threshold values may
perform differently in image classification.
7.2.3 Neural training
Training is a learning process by which the connection weights
are adjusted until the network is deemed to be optimal. This
involves the use of training samples, an error measure, and
a learning algorithm. Training samples are presented to the
network with input and output data over many iterations; they
shouldnot only be large in size but also be representative to ensure
sufficient generalization ability. There are several different error
measures, such as the mean squared error (MSE), the mean
squared relative error (MSRE), the coefficient of efficiency (CE),
and the coefficient of determination ( r 2 ). The MSE has been
Search WWH ::




Custom Search