Parameterizing neural network models to improve land classification performance - Urban Remote Sensing: Monitoring, Synthesis and Modeling in the Urban Environment

Environmental Engineering Reference

In-Depth Information

function) that individually handles pieces of complex problems;

the weighted links between neurons determine the direction

of data flow and the contribution of the ''from'' neuron to

the ''to'' neuron. These weights can be determined through an

iterative training process that learns from known samples and

adjusts the weights between neurons until the minimum error of

the performance function is achieved. While the MLP structure

and the concept of back-propagation algorithm are relatively

simple, network topology and training parameter settings can

complicate the overall performance of neural networks for image

classification (see Foody and Arora, 1997; Kavzoglu and Mather,

2003; Mas and Flores, 2008).

to reduce the data dimensionality and hence the number of

neurons in the input layer (e.g., Benediktsson and Sveinsson,

1997; Liu and Lathrop, 2002). While the number of output

neurons can be defined according to the research objective in a

specific application, a challenging issue is to choose the number

of neurons in hidden layers. If there are too few neurons in

hidden layers, the network may be unable to approximate very

complex functions because of insufficient degrees of freedom. On

the other hand, if there are too many neurons, the network tends

to have a large number of degrees of freedom which may lead

to overtraining and hence poor performance in generalization

(Rojas, 1996). Thus, it is crucial to find the 'optimum' number of

neurons in hidden layers that adequately capture the relationship

in the training data. This optimization can be achieved by using

trial and error or several systematic approaches such as pruning

and constructive algorithms (Reed, 1993).

7.2.2 Network topology

The topology of a neural network is critical for neural computing

to solve problems with reasonable training time and satisfactory

performance. For the MLP networks, the topology is determined

by the number of hidden layers, neurons and connections, and

the type of activation function.

7.2.2.2 Activation function

Activation function is an algorithm that transforms the weighted

sumof inputs and produces outputs of a neuron. Duda, Hart and

Stork (2001) suggested a non-linear activation function should

be used to deal with non-linear relationships; otherwise, neural

networks would provide no computational power above linear

classifiers. The commonly-used non-linear activation functions

include the logistic sigmoid (log-sig) function (Equation 7.1) and

tangent sigmoid (tan-sig) function (Equation 7.2).

7.2.2.1 Number of hidden layers,

neurons and connections

The complexity of neural networks is largely defined by the

total number of input features, the number of output classes,

and the number of hidden layers and neurons (Duda, Hart and

Stork, 2001). While the first two factors are less flexible for

a particular classification problem, we can adjust the last two

to find an appropriate size of neural networks. A trade-off is

needed to balance the processing purpose of the hidden lay-

ers and the training time needed. Large neural networks with

more hidden layers may be more effective to represent non-

linear complex relationships, but usually demand more training

samples and longer training time. And larger neural networks

are more likely to get caught in undesirable local minima or

overly fit the training data. On the other hand, compact neural

networks may overly simplify the phenomena and lead to unsat-

isfactory results although they may be easier to train. Several

prior studies investigated the issue of the optimal number of

hidden layers, but their recommendations are not consistent. For

example, Shupe and Marsh (2004) recommended single-hidden-

layer networks while Civco (1993) preferred two-hidden-layer

networks. Kanellopoulos and Wilkinson (1997) suggested that

single-hidden-layer networks are suitable for most classification

problems but two-hidden-layer networks may be more appro-

priate for those applications with more than 20 output classes.

Some other researchers concluded that the number of hidden

layers may not have a significant influence upon classification

accuracies (e.g., Foody and Arora, 1997; Kavzoglu and Mather,

2003).

The number of neurons for the input, hidden, and output

layers determines the number of weighted links, particularly for

a fully connected network. Since the weight for each link is

determined through neural training, more links tend to increase

the training time. Thus, every effort should be made to minimize

unnecessary neurons in order to improve the efficiency in neural

training. Various feature extraction techniques, such as principal

component analysis and discriminant analysis, have been used

f 1 ( x ) =

(7.1)

e − (1 − a ) x

e (1 − a ) x

e − (1 − a ) x

−

f 2 ( x ) =

(7.2)

e (1 − a ) x

+ e − (1 − a ) x

where f 1 ( x ) refers to the log-sig activation function; f 2 ( x )isthe

tan-sig activation function; x refers to the input data; and a is the

user-defined training threshold.

Several researchers examined different activation functions

and concluded that the tan-sig function performed better (e.g.,

Ozkan and Erbek, 2002; Shupe and Marsh, 2004). However,

little research has been conducted to investigate how the training

threshold could affect the performance of different activation

functions. Based on Equations 7.1 and 7.2, the training threshold

determines the size of the contribution of the input data to the

output of the neuron. That is, it defines the slope of activation

functions in their midrange (Fig. 7.2). Therefore, the same type

of activation function with various training threshold values may

perform differently in image classification.

7.2.3 Neural training

Training is a learning process by which the connection weights

are adjusted until the network is deemed to be optimal. This

involves the use of training samples, an error measure, and

a learning algorithm. Training samples are presented to the

network with input and output data over many iterations; they

shouldnot only be large in size but also be representative to ensure

sufficient generalization ability. There are several different error

measures, such as the mean squared error (MSE), the mean

squared relative error (MSRE), the coefficient of efficiency (CE),

and the coefficient of determination ( r 2 ). The MSE has been

Urban Remote Sensing: Monitoring, Synthesis and Modeling in the Urban Environment

Search WWH ::

Custom Search

Home