Probabilistic Reasoning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

each weight affects the decision variable independently. Although the

assumption to some extent limits the sphere of naïve Bayesian model, in practical

applications, naïve Bayesian model can both exponentially reduce the complexity

for model construction and can express striking robustness and effectiveness

even when the assumption is unsatisfied (Nigam, 1998). It has been successfully

applied in many data mining tasks, such as classification, clustering, model

selection and so on. Currently, many researchers are working to relax the

limitation of independence among variables (Heckerman, 1997), so that the

model can be applied more widely.

6.4.1 Naïve Bayesian learning model

Bayesian theorem tells us how to predict the class of incoming sample given

training samples. The rule of classification is maximum posterior probability,

which is given in the following equation:

(

)

(

)

(

)

(

)

(6.18)

Here

is a test sample to be classified,

(

) is the conditional probability of

under the condition of

. The probabilities at the right side of the equation can be

estimated from training data. Suppose that the sample is represented as a vector

of features. If all features are independent for given classes, P(A|Ci) can be

decomposed as a product of factors:

× × × ? ,

where ai is the ith feature of the test sample. Accordingly, the posterior

computation equation can be rewritten as:

P a

(

)

P a

(

)

P a

(

)

(

)

∏ =

(

)

(

)

(6.19)

(

)

The entire process is called naïve Bayesian classification. In the common

sense, only when the independent assumption holds, or when the correlation of

features is very weak, the naïve Bayesian classifier can achieve the optimal or

sub-optimal result. Yet the strong limited condition seems inconsistent with the

fact that naïve Bayesian classifier gains striking performance in many fields,

including some fields where there is obvious dependence among features. In 16

out of total 28 data sets of UCI, naïve Bayesian classifier outperforms the C4.5

algorithms and has similar performance with that of CN2 and PEBLS. Some

research works report similar results(Clark & Niblett,1989; Dougherty Kohavi &

Sahami,1995). In the same time, researchers have also successfully proposed

some strategy to relax the limitation of independence among features (Nigam,

1998).

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home