Information Technology Reference
In-Depth Information
1.3.3 Probabilistic Classification and Bayes Formula
Assume that, after analyzing a classification problem, a statistical classifica-
tion approach has been deemed preferable to, for instance, a decision tree.
Probabilistic classification methods are based on the idea that both features
and classes may be modeled as random variables (readers unfamiliar with ran-
dom variables will find more information at the beginning of Chap. 2). In that
context, if a pattern is picked randomly from the patterns to be classified, the
class to which it belongs is the realization of a discrete random variable. Sim-
ilarly, the values of the features of a randomly chosen pattern can be viewed
as realizations of random variable, which are usually continuous. For instance,
in the example of discrimination between capacitors and integrated circuits
(Fig. 1.16), the random variable “class” may be equal to 0 for a capacitor and
to 1 for an integrated circuit, while the reflectivity R at the area A may be
viewed as continuous random variables.
In that context, the classification problem can be simply stated as follows:
given a pattern whose class is unknown, whose reflectivity is equal to r and
whose area is equal to a (within measurement uncertainties), what is the
probability that the random variable “class” be equal to 0 (i.e., that the
pattern be a capacitor)? This probability is the posterior probability of class
“capacitor” given the measured reflectivity and area, denoted by
Pr(class = 0
|{
r,a
}
) .
Consider a set of capacitors and integrated circuits that have been labeled with
the labels (0 or 1) of their classes, and whose feature values are also known.
That information can be used for deriving two very important quantities,
the prior probability of each class: a pattern picked randomly from the set
of patterns has a probability Pr( C i ) of belonging to class C i . It we assume
that each pattern belongs to one of the classes, then one has i Pr( C i )=1.
That information is relevant to classification: assume that the prior prob-
ability of the class “capacitor” is known to be 0.9 (hence the probability
of the class “integrated circuit” is 0.1); then a dumb classifier that would
always choose the class “capacitor,” irrespective of the pattern features,
would exhibit an error rate on the order of 10%.
the conditional probability density of each feature : if an integrated circuit
is picked randomly, what is the probability for its area A to lie in an
interval [ a
a .
The probability density of feature A conditioned to class C i ,or likelihood
of C i given feature a is denoted as p A ( a
δ
a,a +
δ
a ]? Clearly, that probability is proportional to
δ
|
C i ): the probability that feature
A be in the interval [ a
a ] given that it belongs to class C i is equal
to p A ( a | C i ) δ a . Since the pattern whose feature A is measured belongs to
δ
a,a +
δ
class C i , one has p A ( a | C i )d a =1.
Class =
integrated circuit) as a function of a . Similarly, one could draw the conditional
Figure 1.17 shows an estimate of the probability density p A ( a
|
Search WWH ::




Custom Search