Information Technology Reference
In-Depth Information
Note that
•
if the prior probabilities are equal, the posterior probabilities are indepen-
dent from the prior probabilities, so that the classification relies solely on
the likelihoods of the classes;
•
if the likelihoods are equal, i.e., if the features have no discriminative power
whatsoever, the classification depends on the prior probabilities only.
Elegant though the Bayesian formulation may be, there is a major di
culty
in its practical application: the estimation of the quantities in the right-hand
side of Bayes formula. Obtaining a good estimate of the prior probabilities of
the classes Pr(
C
i
) is generally an easy task, through simple frequency count-
ing of each class in the sample. In contrast, the estimation of the likelihoods
p
X
(
x
C
i
) is subject to a di
culty known as the
curse of dimensionality
:the
number of patterns necessary for a reliable estimation of the likelihoods grows
exponentially with the dimension of the feature vector. When low-level repre-
sentations of the patterns are used, the number of features may be very large:
if a picture is described by the intensity of its pixels, the dimension of the fea-
ture vector is equal to the number of pixels. We will show that neural networks
are an interesting alternative to Bayesian classification because they provide
a direct estimate of the posterior probabilities without having to estimate the
prior class probabilities and the likelihoods.
Consider an application of Bayes formula: Assume that the probability
distribution of the heights of women in a given population is Gaussian with
mean 1.65 m and standard deviation 0.16 m,
|
0
.
16
√
2
π
exp
2
,
h
1
1
2
−
1
.
65
0
.
16
p
H
(
h
|
W
)=
−
and that the probability distribution of the heights of men in that population
is a Gaussian with mean 1.75 m and standard deviation 0.15 m:
0
.
15
√
2
π
exp
2
.
h
1
1
2
−
1
.
75
0
.
15
p
H
(
h
|
M
)=
−
The above probability densities are shown on Fig. 1.18. The Gaussians exhibit
strong overlapping, which shows that the feature height is not very discrim-
inant. In a real application, such curves would be a strong incentive for the
designer to find one or more alternative features.
In addition, assume that there are as many men and women in the popu-
lation. Given a person whose height is 1.60 m, what is the probability that it
is a woman? The answer is provided by Bayes formula
0
,
5
p
H
(1
.
60
|
W
)
Pr(
W
|
1
.
60) =
M
)
≈
60%
.
0
.
5
p
H
(1
.
60
|
W
)+0
.
5
p
H
(1
.
60
|
Clearly, Pr(
M
|
1
.
60) = 40%.
Search WWH ::
Custom Search