Neural Networks: An Overview - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Note that

•

if the prior probabilities are equal, the posterior probabilities are indepen-

dent from the prior probabilities, so that the classification relies solely on

the likelihoods of the classes;

•

if the likelihoods are equal, i.e., if the features have no discriminative power

whatsoever, the classification depends on the prior probabilities only.

Elegant though the Bayesian formulation may be, there is a major di culty

in its practical application: the estimation of the quantities in the right-hand

side of Bayes formula. Obtaining a good estimate of the prior probabilities of

the classes Pr( C i ) is generally an easy task, through simple frequency count-

ing of each class in the sample. In contrast, the estimation of the likelihoods

p X ( x

C i ) is subject to a di culty known as the curse of dimensionality :the

number of patterns necessary for a reliable estimation of the likelihoods grows

exponentially with the dimension of the feature vector. When low-level repre-

sentations of the patterns are used, the number of features may be very large:

if a picture is described by the intensity of its pixels, the dimension of the fea-

ture vector is equal to the number of pixels. We will show that neural networks

are an interesting alternative to Bayesian classification because they provide

a direct estimate of the posterior probabilities without having to estimate the

prior class probabilities and the likelihoods.

Consider an application of Bayes formula: Assume that the probability

distribution of the heights of women in a given population is Gaussian with

mean 1.65 m and standard deviation 0.16 m,

|

0 . 16 √ 2 π exp

2 ,

h

1

2

−

1 . 65

0 . 16

p H ( h

|

W )=

−

and that the probability distribution of the heights of men in that population

is a Gaussian with mean 1.75 m and standard deviation 0.15 m:

0 . 15 √ 2 π exp

2 .

h

1

2

−

1 . 75

0 . 15

p H ( h

|

M )=

−

The above probability densities are shown on Fig. 1.18. The Gaussians exhibit

strong overlapping, which shows that the feature height is not very discrim-

inant. In a real application, such curves would be a strong incentive for the

designer to find one or more alternative features.

In addition, assume that there are as many men and women in the popu-

lation. Given a person whose height is 1.60 m, what is the probability that it

is a woman? The answer is provided by Bayes formula

0 , 5 p H (1 . 60

|

W )

Pr( W

|

1 . 60) =

M ) ≈

60% .

0 . 5 p H (1 . 60

|

W )+0 . 5 p H (1 . 60

|

Clearly, Pr( M

|

1 . 60) = 40%.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home