Information Technology Reference
In-Depth Information
each weight affects the decision variable independently. Although the
assumption to some extent limits the sphere of naïve Bayesian model, in practical
applications, naïve Bayesian model can both exponentially reduce the complexity
for model construction and can express striking robustness and effectiveness
even when the assumption is unsatisfied (Nigam, 1998). It has been successfully
applied in many data mining tasks, such as classification, clustering, model
selection and so on. Currently, many researchers are working to relax the
limitation of independence among variables (Heckerman, 1997), so that the
model can be applied more widely.
6.4.1 Naïve Bayesian learning model
Bayesian theorem tells us how to predict the class of incoming sample given
training samples. The rule of classification is maximum posterior probability,
which is given in the following equation:
P
(
C
|
A
)
=
P
(
C
)
*
P
(
A
|
C
)
i
i
i
P
(
A
)
(6.18)
Here
A
is a test sample to be classified,
P
(
Y
|
X
) is the conditional probability of
Y
under the condition of
. The probabilities at the right side of the equation can be
estimated from training data. Suppose that the sample is represented as a vector
of features. If all features are independent for given classes, P(A|Ci) can be
decomposed as a product of factors:
X
× × × ? ,
where ai is the ith feature of the test sample. Accordingly, the posterior
computation equation can be rewritten as:
P a
(
|
C
)
P a
(
|
C
)
P a
(
|
C
)
1
i
2
i
m
i
P
(
C
)
m
=
P
(
C
|
A
)
=
i
P
(
a
|
C
)
(6.19)
i
j
i
P
(
A
)
j
The entire process is called naïve Bayesian classification. In the common
sense, only when the independent assumption holds, or when the correlation of
features is very weak, the naïve Bayesian classifier can achieve the optimal or
sub-optimal result. Yet the strong limited condition seems inconsistent with the
fact that naïve Bayesian classifier gains striking performance in many fields,
including some fields where there is obvious dependence among features. In 16
out of total 28 data sets of UCI, naïve Bayesian classifier outperforms the C4.5
algorithms and has similar performance with that of CN2 and PEBLS. Some
research works report similar results(Clark & Niblett,1989; Dougherty Kohavi &
Sahami,1995). In the same time, researchers have also successfully proposed
some strategy to relax the limitation of independence among features (Nigam,
1998).
Search WWH ::




Custom Search