Information Technology Reference
In-Depth Information
Note that
if the prior probabilities are equal, the posterior probabilities are indepen-
dent from the prior probabilities, so that the classification relies solely on
the likelihoods of the classes;
if the likelihoods are equal, i.e., if the features have no discriminative power
whatsoever, the classification depends on the prior probabilities only.
Elegant though the Bayesian formulation may be, there is a major di culty
in its practical application: the estimation of the quantities in the right-hand
side of Bayes formula. Obtaining a good estimate of the prior probabilities of
the classes Pr( C i ) is generally an easy task, through simple frequency count-
ing of each class in the sample. In contrast, the estimation of the likelihoods
p X ( x
C i ) is subject to a di culty known as the curse of dimensionality :the
number of patterns necessary for a reliable estimation of the likelihoods grows
exponentially with the dimension of the feature vector. When low-level repre-
sentations of the patterns are used, the number of features may be very large:
if a picture is described by the intensity of its pixels, the dimension of the fea-
ture vector is equal to the number of pixels. We will show that neural networks
are an interesting alternative to Bayesian classification because they provide
a direct estimate of the posterior probabilities without having to estimate the
prior class probabilities and the likelihoods.
Consider an application of Bayes formula: Assume that the probability
distribution of the heights of women in a given population is Gaussian with
mean 1.65 m and standard deviation 0.16 m,
|
0 . 16 2 π exp
2 ,
h
1
1
2
1 . 65
0 . 16
p H ( h
|
W )=
and that the probability distribution of the heights of men in that population
is a Gaussian with mean 1.75 m and standard deviation 0.15 m:
0 . 15 2 π exp
2 .
h
1
1
2
1 . 75
0 . 15
p H ( h
|
M )=
The above probability densities are shown on Fig. 1.18. The Gaussians exhibit
strong overlapping, which shows that the feature height is not very discrim-
inant. In a real application, such curves would be a strong incentive for the
designer to find one or more alternative features.
In addition, assume that there are as many men and women in the popu-
lation. Given a person whose height is 1.60 m, what is the probability that it
is a woman? The answer is provided by Bayes formula
0 , 5 p H (1 . 60
|
W )
Pr( W
|
1 . 60) =
M )
60% .
0 . 5 p H (1 . 60
|
W )+0 . 5 p H (1 . 60
|
Clearly, Pr( M
|
1 . 60) = 40%.
Search WWH ::




Custom Search