Information Technology Reference
In-Depth Information
(A coding scheme such as T =
turns out to be uncomfortable to
deal with.) For instance, for c =4the classifier would have four outputs,
{
{
1 ,...,c
}
,andclass ω 2 , say, would be represented by [0 1 0 0] T
z 1 ,z 2 ,z 3 ,z 4 }
or
1] T ; i.e., with z 2 discriminating ω 2 from its complement. We
then see that the 1-of- c coding scheme allows to analyze the classification
task as a set of dichotomizations, whereby any class is discriminated from its
complement. It is not surprising, therefore, that in the following chapters we
will concentrate on studying two-class classification problems; the results of
the study transpose directly to multi-class problems.
Classifiers, such as several types of neural networks (an example of which
are multilayer perceptrons), are regression-like machines: the classifier builds
a continuous approximation to the targets, using some function y w ,whichis
subsequently thresholded in order to yield numerical values according to the
coding scheme. For instance, when neural network outputs are produced by
sigmoid units in the [0 , 1] range one may set a threshold at 0.5; an output
larger than 0.5 is assigned the class label corresponding to T =1(say, ω 2 ),
otherwise to its complement ( ω 2 ), corresponding to T =0. The function
y w can be interpreted in a geometrical sense, with its X
[
11
1
y w ( X ) graph
representing a decision surface that separates (discriminates) instances from
distinct classes. The classifier mapping is z w : X
×
T ,with z w = θ w ( y w ) and
θ w an appropriate thresholding function.
Neural networks and decision trees are used extensively in the present
topic to illustrate the minimum error entropy approach to supervised classi-
fication. There is an abundant literature on theoretical issues, as well as on
architectures and design methods of neural networks and decision trees. Rel-
evant topics on neural networks are [26,27,142,96]; for decision trees, see [33]
and [177]. Supervised and unsupervised classifiers, especially with a focus on
Pattern Recognition applications, are presented in detail by [59] and [225].
A main reference on theoretical issues concerning classifier operation and
performance is [52].
1.2 Data-Based Learning
For a classifier whose design is based on a set ( X ds ,T ds )=
{
( x i ,t ( x i )); i =
1 , 2 , ..., n
}
some type of approximation of the output z w ( x ) to t ( x ), for all
x
X , is required. From all possible approximation criteria one can think of
there is one criterion that clearly stands out, which addresses the mismatch
between observed and predicted class labels: how close is the classifier of
achieving the minimum probability of (classification) error, P e ,in X × T ?
We are then interested in the ecacy of the classifier in the population, X×T .
Figure 1.1 illustrates a data-based classification system implementing
z w ∈Z W , where the parameter w is tuned by a learning algorithm , built
expressly for the design process alone. The learning algorithm performs an
 
Search WWH ::




Custom Search