Probabilistic Reasoning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

probabilistic correlation of variables.

(2) Bayesian network can learn the casual relation of variables. Casual relation is

a very important pattern in data mining, mainly because: in data analysis, casual

relation is helpful for field knowledge understanding; it can also easily lead to

precise prediction even under much interference. For example, some sale

analyzers wonder whether advertisement increasing will cause sales increasing.

To get the answer, the analyzer must know whether advertisement increasing is

the causation of sale increasing. For Bayesian network, this question can be

easily answered even without experimental data, because the causal relation has

been encoded in the Bayesian network.

(3) The combination of Bayesian network and Bayesian statistics can take full

advantage of field knowledge and information from data. Everyone with

modeling experiences knows that prior information or field knowledge is very

important to modeling, especially when sample data are sparse or hardly to

obtain. Some commercial expert system, which is constructed purely based on

field expert knowledge, is a perfect example. Bayesian network, which expresses

dependent relation with directed edge and uses probabilistic distribution to

describe the strength of dependence, can integrate the prior knowledge and

sample information well.

(4) The combination of Bayesian network and other models can effectively avoid

over-fitting problem.

3. Bayesian method in clustering and pattern discovery

Generally, clustering is a special case of model selection. Each clustering pattern

can be viewed as a model. The task of clustering is to find a pattern, which best

fits the nature of data, from many models based on analysis and some other

strategies. Bayesian method integrates prior knowledge and characteristics of

current data to select the best model.

With Bayesian analysis Vaithyanathan et al. proposed a model based

hierarchical clustering method (Vaithyanathan,1998). By partitioning feature set,

they organized data to a hierarchical structure. The features either have unique

distribution in different classes or have same distribution in some classes. They

also give the method to determine the model structure with marginal likelihood,

including how to automatically determine the number of classes, depth of the

model tree, and the feature subset of each class.

AutoClass is a typical system that implements clustering with Bayesian

method. This system automatically determines the number of classes and

complexity of model by searching all possible classifications in the model space.

It allows that features in certain classes have correlation and successive relation

Search WWH ::

Custom Search

Home