Probabilistic Reasoning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

naïve Bayesian classifier, the time complexity of each round is O(

nf

). T round

training corresponds to O(

Tnf

). Notice that

T

is a constant. So the entire time

complexity is still O(

).

For naïve Bayesian classifier, the primary computation is counting. Training

examples can be processed either sequentially or in batches from disk or tape. So

this method is perfectly suit for knowledge discovery on large data set. Training

set is not necessarily loaded to memory entirely, and part of it can be kept in the

disks or tapes. Yet the boosting naïve Bayesian model also has the following

problems.

nf

(1) From the idea of boosting, when noise exists in training set, boosting method

will take it as useful information and amplify its effect with large weight. This

will reduce the performance of boosting. If there are many noise data, boosting

will lead to worse result.

(2) Although theoretically boosting can achieve 0 error rate for training set, in its

practical application of naïve Bayesian model, 0 classification error in training

set is generally hardly guaranteed.

6.5 Construction of Bayesian Network

6.5.1 Structure of Bayesian network and its construction

In short, Bayesian network is a directed acyclic graph with probabilistic notes.

The graphic model can be utilized to represent the (physical or Bayesian) joint

distribution of large variable set. It can also be used to analyze correlations

among numerous variables. With the capability of learning and statistical

reasoning under Bayesian theorem, it can fulfill many data mining tasks, such as

prediction, classification, clustering, casual analysis and so on.

Given a series of variables X ={

n }, a Bayesian network is

composed of two components: one is network structure S, which represents the

conditional independence among variables X ; the other is local distribution set

x

1 ,

x

2 , … ,

x

,

which is related to every variable. The two components define the joint

probability of X . S is a directed acyclic graph. Nodes in S and variables in X are

one to one correspondingly. Let

P

x i be a variable or node and Pa i be the parent

nodes of

i in S. The absence of arc between nodes usually represents conditional

independence. The joint probability of X is represented as:

x

n

= ∏

p x

(

|

p

( X ) =

Pa i )

(6.32)

i

1

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home