Information Technology Reference
In-Depth Information
probabilistic correlation of variables.
(2) Bayesian network can learn the casual relation of variables. Casual relation is
a very important pattern in data mining, mainly because: in data analysis, casual
relation is helpful for field knowledge understanding; it can also easily lead to
precise prediction even under much interference. For example, some sale
analyzers wonder whether advertisement increasing will cause sales increasing.
To get the answer, the analyzer must know whether advertisement increasing is
the causation of sale increasing. For Bayesian network, this question can be
easily answered even without experimental data, because the causal relation has
been encoded in the Bayesian network.
(3) The combination of Bayesian network and Bayesian statistics can take full
advantage of field knowledge and information from data. Everyone with
modeling experiences knows that prior information or field knowledge is very
important to modeling, especially when sample data are sparse or hardly to
obtain. Some commercial expert system, which is constructed purely based on
field expert knowledge, is a perfect example. Bayesian network, which expresses
dependent relation with directed edge and uses probabilistic distribution to
describe the strength of dependence, can integrate the prior knowledge and
sample information well.
(4) The combination of Bayesian network and other models can effectively avoid
over-fitting problem.
3. Bayesian method in clustering and pattern discovery
Generally, clustering is a special case of model selection. Each clustering pattern
can be viewed as a model. The task of clustering is to find a pattern, which best
fits the nature of data, from many models based on analysis and some other
strategies. Bayesian method integrates prior knowledge and characteristics of
current data to select the best model.
With Bayesian analysis Vaithyanathan et al. proposed a model based
hierarchical clustering method (Vaithyanathan,1998). By partitioning feature set,
they organized data to a hierarchical structure. The features either have unique
distribution in different classes or have same distribution in some classes. They
also give the method to determine the model structure with marginal likelihood,
including how to automatically determine the number of classes, depth of the
model tree, and the feature subset of each class.
AutoClass is a typical system that implements clustering with Bayesian
method. This system automatically determines the number of classes and
complexity of model by searching all possible classifications in the model space.
It allows that features in certain classes have correlation and successive relation
Search WWH ::




Custom Search