Probabilistic Reasoning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

only utilizes the information from evidence. The general process to estimate

parameter vector via Bayesian method is described as follows:

(1) Regard unknown parameters as random vector. This is the fundamental

difference between Bayesian method and traditional statistical approach.

(2) Defined the prior ( ) based on previous knowledge of parameter . This step

is a controversial step, and is attacked by conventional statistical scientists.

(3) Calculate posterior density and make estimation of parameters according to

the posterior distribution.

In the second step, if there is no previous knowledge to determine the prior

( ) of a parameter, Bayes suggested to assume uniform distribution to be its

distribution. This is called Bayesian assumption. Intuitionally, Bayesian

assumption is well accepted. Yet, it encounters problem when no information

about prior distribution is available, especially when parameter is infinite.

Empirical Bayes (EB) Estimator combines conventional statistical method and

Bayesian method, so that it applies conventional method to gain the marginal

density p(x), and then ascertains prior ( ) with the following formula.

+∞

= Ð

p x

( )

π θ

( )

p x

(

|

θ

)

d

θ

−∞

6.1.3 Applications of Bayesian network in data mining

1. Bayesian method in classification and regression analysis

Classification is to classify an object based on its feature vector and some

constrains. In data mining, we mainly study how to learn classification rules from

data or experiences. For classification, sometimes each feature vector correspond

to one class label (determinate classification); sometimes different classes can

overlap, where samples from different classes are very similar and we can only

tell the probabilities of a sample in all classes and choose a class for the sample

according to the probabilities. Bayesian School provides two methods to handle

this situation: one is selecting the class with maximum posterior probability; the

other is selecting the class with maximum utility function or minimum lost

function. Let feature vector be X

= (

x 1 , x 2 , …, x m ), and class vector be C = (

c 1 ,

c 2 , …, c l )

. Classification is to assign a class

c i

(

i ∈ (1,

…,l

) to a feature vector

X .

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home