Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

that case, usually the class is decided through a vote, based on the value of

the potential of the output neuron. The underlying rationale, called winner

takes all (WTA), is that the larger the potential on the output neuron, the

more confident we are on its classification.

We will show below that the probabilistic interpretation of the classifica-

tion is based on the distance of the examples to the discriminant surfaces, that

is, the absolute value of the potential divided by the norm of the weight vec-

tor. Therefore, our confidence in a classification should be based on distances

and not on bare potentials, unless the weights are normalized. But a deeper

problem posed by the WTA procedure is the following: the output unit only

reflects the properties of the internal representations. Our confidence should

depend on the distances of the input vector to the discriminant surfaces in

input space, which are proportional to the potentials of the hidden neurons.

It may happen that the input pattern lies so close to one discriminant surface

in input space that its class is uncertain. However, its internal representation

may have a large stability (see Fig. 6.18), and win in the WTA procedure

against the other classifiers.

Another way of dealing with the problem of multiple classes is to construct

trees of neural networks. To this end, we choose a sequence of classes in an

arbitrary order, for example

{

K, 2 ,..., 1

}

, and we learn the discrimination

between the first class and the K

1 others. In our example, we may define

targets y = 1 for the examples of the first class (in our example, y K ), and y =

−

1 for the others. Then, we restrict the training set to patterns of the classes

not yet discriminated (

in our example), and we learn the separation

of class 2 from the others. The procedure is repeated until the two remaining

classes are separated. One interest of this heuristics is that the successive

training sets have decreasing sizes. The resulting network has a tree structure.

In order to classify a new input, it has to be first classified by the first network.

If the output is σ = +1, the class is K . Otherwise ( σ =

{

2 ,..., 1

}

1) the pattern is

presented as input to the second network. The procedure stops as soon as

one network recognizes (output σ = +1) the pattern. Since the sequence of

classes selected at the beginning is arbitrary, in principle one should compare

the outputs of different trees, each tree corresponding to a different sequence

of classes. However, if the number of classes is large (typically for K> 4)

this method is inapplicable. Another solution was proposed in the section

“methodology” of Chap. 1: if the classes not mutually linearly separable one

may resort to pairwise separation. For a problem with K classes, this requires

the construction of K ( K +1) / 2 classifiers which in many practical applications

turn out to be linear. Since there is no arbitrary sequence chosen a priori, there

is no need to compare the outputs of K ! classifiers. One advantage of this

solution is that one can use different descriptors for the different separations,

which may simplify the problem. We have shown in Chap. 1 how to estimate

the probability that a given pattern belongs to each of the possible classes,

based on the results obtained in the pairwise separations.

−

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home