Databases Reference
In-Depth Information
is the posterior probability , or a posteriori probability , of H conditioned
on X . For example, suppose our world of data tuples is confined to customers described
by the attributes age and income , respectively, and that X is a 35-year-old customer with
an income of $40,000. Suppose that H is the hypothesis that our customer will buy a
computer. Then P
P
.
H j X
/
reflects the probability that customer X will buy a computer
given that we know the customer's age and income.
In contrast, P
.
H j X
/
is the prior probability , or a priori probability, of H . For our exam-
ple, this is the probability that any given customer will buy a computer, regardless of age,
income, or any other information, for that matter. The posterior probability, P
.
H
/
,
is based on more information (e.g., customer information) than the prior probability,
P
.
H j X
/
.
H
, which is independent of X .
Similarly, P
/
is the posterior probability of X conditioned on H . That is, it is the
probability that a customer, X , is 35 years old and earns $40,000, given that we know the
customer will buy a computer.
P
.
X j H
/
is the prior probability of X . Using our example, it is the probability that a
person from our set of customers is 35 years old and earns $40,000.
“How are these probabilities estimated?” P
.
X
/
may be estimated
from the given data, as we shall see next. Bayes' theorem is useful in that it provides
a way of calculating the posterior probability, P
.
H
/
, P
.
X j H
/
, and P
.
X
/
.
H j X
/
, from P
.
H
/
, P
.
X j H
/
, and P
.
X
/
.
Bayes' theorem is
P
.
X j H
/
P
.
H
/
P
.
H j X
/D
.
(8.10)
P
.
X
/
Now that we have that out of the way, in the next section, we will look at how Bayes'
theorem is used in the naıve Bayesian classifier.
8.3.2 Naıve Bayesian Classification
The naıve Bayesian classifier, or simple Bayesian classifier, works as follows:
1. Let D be a training set of tuples and their associated class labels. As usual, each tuple
is represented by an n -dimensional attribute vector, X D.
x 1 , x 2 ,
:::
, x n /
, depicting n
measurements made on the tuple from n attributes, respectively, A 1 , A 2 ,
:::
, A n .
2. Suppose that there are m classes, C 1 , C 2 ,
, C m . Given a tuple, X , the classifier will
predict that X belongs to the class having the highest posterior probability, condi-
tioned on X . That is, the naıve Bayesian classifier predicts that tuple X belongs to the
class C i if and only if
:::
P
.
C i j X
/>
P
.
C j j X
/
for 1 j m , j 6D i .
Thus, we maximize P
is maximized is called
the maximum posteriori hypothesis . By Bayes' theorem (Eq. 8.10),
.
C i j X
/
. The class C i for which P
.
C i j X
/
P
.
X j C i /
P
.
C i /
P
.
C i j X
/D
.
(8.11)
P
.
X
/
 
Search WWH ::




Custom Search