Introduction - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

The best classifier is

z w ∗ =argmin

z w ∈Z W

P e ( Z w ) .

(1.5)

T , with optimal parameter w ∗ ,isthebestone(in

the minimum P e sense) in the family

The classifier z w ∗ : X

→

Z W . We will often denote P e ( Z w ∗ )

simply as min P e , signifying min Z W P e , the minimum probability of error

for the functional family allowed by the classifier architecture.

An important aspect concerning the estimates

P e ( n ) pro-

duced by a classifier is whether or not they will converge (in some sense)

with growing n to min P e .This consistency issue of the learning algorithm

will be addressed when appropriate.

4. If one knew the class priors, P ( t k ), and the class conditional distributions

of the targets, p ( x

P e ( Z w )

≡

t k ),with p representing either a PMF or a PDF, one

would then be able to determine the best possible classifier based on the

Bayes decision theory: just pick the class that maximizes the posterior

probability

x )= p ( x

t k ) P ( t k )

p ( x )

P ( t k |

, with p ( x )=

p ( x

t k ) P ( t k ) .

(1.6)

k =1

This is the procedure followed by the model-based approach to classifica-

tion. The best — P ( t k |

x ) maximizing — classifier is known as the Bayes

classifier, z Bayes .

One always has P e ( Z w ) ≥ P e ( Z w ∗ ) ≥ P e ( Z Bayes ). Note that there will

be function families Z B such that z w ∗ ( · )= z Bayes ( · ) with w ∗ ∈ B (e.g.,

multilayer perceptrons with “enough” hidden neurons are known to have

universal functional approximation capabilities); however, one usually will

not be sure whether or not

Z B is implementable by the classification sys-

tem being used (for multilayer perceptrons, “enough” may not be afford-

able, among other things because of the generalization issue). We, there-

fore, will not pursue the task of analyzing the approximation of data-based

classifiers to z Bayes .

We also shall not discuss whether z w is convergent with n (in some

sense) to z Bayes , the so-called Bayes-consistency issue, largely dependent

on the classification system being used; as a matter of fact, the lack of

Bayes-consistency does not preclude the usefulness of a classification sys-

tem (binary decision trees with impurity decision rules are an example of

that). For details on the consistency of classification systems the reader

may find useful to consult [52] and [11].

Let us now address the problem of how to find the best classifier z w ∗ ,afford-

able by the function family

Z W implemented by the classification system.

One could consider using formula (1.4) (with large n so that P e ( n ) is close to

P e ( Z w ∗ )) and perform an exhaustive search in some discrete version of the

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home