Information Technology Reference
In-Depth Information
motivating and constructing prior probabilities was not adequately answered,
Bayesian theory was not well accepted at that time. Early in 20th century, B.
de Finetti and Jeffreys H. made significant contribution to Bayesian theory. After
World War , Wald A. proposed statistical decision theory. In this theory,
Bayesian method played an important role. Besides, the development of
information science also contributed to the reincarnation of Bayesian theory. In
1958, Bayes' paper was republished by Biometrika, the most historical statistical
magazine in Britain. In 1950s, Robbins H. suggested to combine empirical
Bayesian approach and conventional statistical method. The novel approach
caused the attention of statistical research field, and soon showed its merits, and
became an active research direction.
With the development of artificial intelligence, especially after the rise of
machine learning and data mining, Bayesian theory gained much more
development and applications. Its connotation has also varied greatly from its
origination. In 1980s, Bayesian networks were used for knowledge representation
in expert systems. In 1990s, Bayesian networks were applied to data mining and
machine learning. Recently, more and more papers concerning Bayesian theory
were published, which covered most fields of artificial intelligence, including
casual reasoning, uncertain knowledge representation, pattern recognition,
clustering analysis and so on. There appears an organization and a journal, ISBA,
which focus especially on the progress on Bayesian theory.
6.1.2 Basic concepts of Bayesian method
In Bayesian theory, all kinds of uncertainties are represented with probabilities.
Learning and reasoning are implemented via probabilistic rules. The Bayesian
learning results are distributions of random variables, which show the briefs to
various possible results. The foundations of Bayesian School are Bayesian
theorem and Bayesian assumption. Bayesian theorem connects prior probabilities
of events with their posterior probabilities. Assume the joint probability density
of random vector x and is p(x, ), and p(x) and p( ) give the marginal densities
of x and respectively. In common cases, x is an observation vector and is an
unknown parameter vector. The estimation of parameter can be obtained with
the observation vector via Bayesian theorem. The Bayesian theorem is as
follows:
π θ
( )
p x
(
|
θ
)
π θ
( )
p x
(
|
θ
)
p
(
θ
|
x
)
=
= Ð
π
)
is the prior of θ ) (6.1)
(
p x
( )
π θ
( )
p x
(
|
θ
)
d
θ
From the formula above, we see that in Bayesian method the estimation of a
parameter needs the prior information of the parameter and the information from
evidence. In contrast, traditional statistical method, e.g. maximum likelihood,
Search WWH ::




Custom Search