Information Technology Reference
In-Depth Information
Pairwise Classification
For di cult problems, it is often much safer to split a C -class classification
problem into C ( C
1) / 2 pairwise classification problems, for the following
reasons:
When performing pairwise classification, the designer can take advantage
of many theoretical results and algorithms, pertaining to linear class sepa-
ration; they are fully developed in Chap. 6; we give a cursory introduction
to that material in the next section, entitled linear separability.
The resulting networks are much more compact, with fast training and
simple analysis; since each network has a single output, its probabilistic
interpretation is trivial.
The features that are relevant for separating class A from class B are not
necessarily identical to the features that are relevant for separating class A
from class C ; therefore, each classifier has only the inputs that are relevant
to its own task, whereas a multilayer Perceptron for global separation must
have all input features that are relevant for the discrimination of all classes;
the feature selection techniques that are described in Chap. 2 can be used
in a very straightforward fashion.
Once the C ( C
1) / 2 posterior probabilities are estimated, possibly with sim-
ple linear separators (neural networks with no hidden neuron), the posterior
probability of class C i for a feature vector x is computed as
1
Pr( C i
|
x )=
,
C
1
Pr ij
( C
2)
j =1 ,j = i
where C is the number of classes and Pr ij is the posterior probability of class
i or class j , as estimated by the neural network that separates class C i
from
class C j .
Linear Separability
Two sets of patterns, described in an n-dimensional feature space, belonging to
two different classes, are said to be “linearly separable” if they lie on different
sides of a hyperplane in feature space.
If two sets of examples are linearly separable, a neural network made of a
single neuron (also termed perceptron can separate them. Consider a neuron
with a sigmoid activation function with n inputs; its output is given by y =
th [ i =1 w i x i ]. The simple relation P =( y +1) / 2 provides an interpretation
of the output of the classifier as a posterior probability. From Bayes decision
rule, the equation of the boundary between the classes is given by P =0 . 5,
or equivalently y = 0. Therefore, the separating surface is a hyperplane in
 
Search WWH ::




Custom Search