Databases Reference
In-Depth Information
9.1.1 Concepts and Mechanisms
The naıve Bayesian classifier makes the assumption of class conditional independence,
that is, given the class label of a tuple, the values of the attributes are assumed to
be conditionally independent of one another. This simplifies computation. When the
assumption holds true, then the naïve Bayesian classifier is the most accurate in com-
parison with all other classifiers. In practice, however, dependencies can exist between
variables. Bayesian belief networks specify joint conditional probability distributions.
They allow class conditional independencies to be defined between subsets of variables.
They provide a graphical model of causal relationships, on which learning can be per-
formed. Trained Bayesian belief networks can be used for classification. Bayesian belief
networks are also known as belief networks , Bayesian networks , and probabilistic
networks . For brevity, we will refer to them as belief networks.
A belief network is defined by two components—a directed acyclic graph and a set of
conditional probability tables (Figure 9.1). Each node in the directed acyclic graph rep-
resents a random variable. The variables may be discrete- or continuous-valued. They
may correspond to actual attributes given in the data or to “hidden variables” believed
to form a relationship (e.g., in the case of medical data, a hidden variable may indicate
a syndrome, representing a number of symptoms that, together, characterize a specific
disease). Each arc represents a probabilistic dependence. If an arc is drawn from a node
Y to a node Z , then Y is a parent or immediate predecessor of Z , and Z is a descendant
FamilyHistory
Smoker
LungCancer
Emphysema
FH, S
0.8
0.2
FH, ~S
0.5
0.5
~FH, S
0.7
0.3
~FH, ~S
0.1
0.9
LC
~LC
PositiveXRay
Dyspnea
(a)
(b)
Figure 9.1 Simple Bayesian belief network. (a) A proposed causal model, represented by a directed
acyclic graph. (b) The conditional probability table for the values of the variable LungCancer
( LC ) showing each possible combination of the values of its parent nodes, FamilyHis-
tory ( FH ) and Smoker ( S ). Source: Adapted from Russell, Binder, Koller, and Kanazawa
[RBKK95].
 
Search WWH ::




Custom Search