Information Technology Reference
In-Depth Information
Fig. 1.27. A multilayer Perceptron with C outputs for classification. The activation
functions of the output neurons are sigmoids
There are several important differences between a multilayer perceptron
for classification and a multilayer perceptron for regression.
The activation functions of the output neurons of neural networks for mod-
eling is usually linear; by contrast, the output neurons of neural networks
for classification have nonlinear activation functions such as sigmoids: since
the outputs of the neural network are probabilities, they must lie between
0 and 1 (readily amenable to [
1 , +1]); in Chap. 6, a theoretical justifica-
tion for the use of the tanh function as an activation function of output
neurons will be given,
For classification, minimizing the cross-entropy cost function is more nat-
ural than minimizing the least squares cost function [Hopfield 1987; Baum
1988; Hampshire 1990]; the training algorithms that will be described in
Chap. 2 can readily be applied to this cost function,
γ i Log g i ( x k )
+(1
γ i )Log 1 g i ( x k )
1
.
C
J =
γ i
γ i
i =1
k
where γ i is the desired value (0 or 1) for output i when the classifier's
input is example k , described by feature vector x k ,and g i ( x k ) is the value
of output i of the classifier. That function is minimum when all examples
are correctly classified.
After training, it is safe to check that the sum of the outputs is equal to 1
for all examples. The Softmax technique [Bridle 1990] guarantees that the
above condition is fulfilled automatically. Of course, that is not a problem for
pairwise classifiers, which have a single output.
The question of overfitting, which we have encountered in nonlinear re-
gression, is also valid for discrimination. If the classifier is overparameterized,
it separates very accurately the patterns of the training set and has a poor
generalization ability. Model selection techniques, such as those described in
Chap. 2, must be used in order to select the best model. Essentially, one must
Search WWH ::




Custom Search