Neural Networks: An Overview - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Pairwise Classification

For di cult problems, it is often much safer to split a C -class classification

problem into C ( C

−

1) / 2 pairwise classification problems, for the following

reasons:

•

When performing pairwise classification, the designer can take advantage

of many theoretical results and algorithms, pertaining to linear class sepa-

ration; they are fully developed in Chap. 6; we give a cursory introduction

to that material in the next section, entitled linear separability.

•

The resulting networks are much more compact, with fast training and

simple analysis; since each network has a single output, its probabilistic

interpretation is trivial.

•

The features that are relevant for separating class A from class B are not

necessarily identical to the features that are relevant for separating class A

from class C ; therefore, each classifier has only the inputs that are relevant

to its own task, whereas a multilayer Perceptron for global separation must

have all input features that are relevant for the discrimination of all classes;

the feature selection techniques that are described in Chap. 2 can be used

in a very straightforward fashion.

Once the C ( C

1) / 2 posterior probabilities are estimated, possibly with sim-

ple linear separators (neural networks with no hidden neuron), the posterior

probability of class C i for a feature vector x is computed as

−

1

Pr( C i

|

x )=

,

C

1

Pr ij −

( C

−

2)

j =1 ,j = i

where C is the number of classes and Pr ij is the posterior probability of class

i or class j , as estimated by the neural network that separates class C i

from

class C j .

Linear Separability

Two sets of patterns, described in an n-dimensional feature space, belonging to

two different classes, are said to be “linearly separable” if they lie on different

sides of a hyperplane in feature space.

If two sets of examples are linearly separable, a neural network made of a

single neuron (also termed perceptron can separate them. Consider a neuron

with a sigmoid activation function with n inputs; its output is given by y =

th [ i =1 w i x i ]. The simple relation P =( y +1) / 2 provides an interpretation

of the output of the classifier as a posterior probability. From Bayes decision

rule, the equation of the boundary between the classes is given by P =0 . 5,

or equivalently y = 0. Therefore, the separating surface is a hyperplane in

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home