Information Technology Reference
In-Depth Information
The best classifier is
z w =argmin
z w ∈Z W
P e ( Z w ) .
(1.5)
T , with optimal parameter w ,isthebestone(in
the minimum P e sense) in the family
The classifier z w : X
Z W . We will often denote P e ( Z w )
simply as min P e , signifying min Z W P e , the minimum probability of error
for the functional family allowed by the classifier architecture.
An important aspect concerning the estimates
P e ( n ) pro-
duced by a classifier is whether or not they will converge (in some sense)
with growing n to min P e .This consistency issue of the learning algorithm
will be addressed when appropriate.
4. If one knew the class priors, P ( t k ), and the class conditional distributions
of the targets, p ( x
P e ( Z w )
t k ),with p representing either a PMF or a PDF, one
would then be able to determine the best possible classifier based on the
Bayes decision theory: just pick the class that maximizes the posterior
probability
|
c
x )= p ( x
|
t k ) P ( t k )
p ( x )
P ( t k |
, with p ( x )=
p ( x
|
t k ) P ( t k ) .
(1.6)
k =1
This is the procedure followed by the model-based approach to classifica-
tion. The best — P ( t k |
x ) maximizing — classifier is known as the Bayes
classifier, z Bayes .
One always has P e ( Z w ) ≥ P e ( Z w ) ≥ P e ( Z Bayes ). Note that there will
be function families Z B such that z w ( · )= z Bayes ( · ) with w ∈ B (e.g.,
multilayer perceptrons with “enough” hidden neurons are known to have
universal functional approximation capabilities); however, one usually will
not be sure whether or not
Z B is implementable by the classification sys-
tem being used (for multilayer perceptrons, “enough” may not be afford-
able, among other things because of the generalization issue). We, there-
fore, will not pursue the task of analyzing the approximation of data-based
classifiers to z Bayes .
We also shall not discuss whether z w is convergent with n (in some
sense) to z Bayes , the so-called Bayes-consistency issue, largely dependent
on the classification system being used; as a matter of fact, the lack of
Bayes-consistency does not preclude the usefulness of a classification sys-
tem (binary decision trees with impurity decision rules are an example of
that). For details on the consistency of classification systems the reader
may find useful to consult [52] and [11].
Let us now address the problem of how to find the best classifier z w ,afford-
able by the function family
Z W implemented by the classification system.
One could consider using formula (1.4) (with large n so that P e ( n ) is close to
P e ( Z w )) and perform an exhaustive search in some discrete version of the
 
Search WWH ::




Custom Search