Information Technology Reference
In-Depth Information
only utilizes the information from evidence. The general process to estimate
parameter vector via Bayesian method is described as follows:
(1) Regard unknown parameters as random vector. This is the fundamental
difference between Bayesian method and traditional statistical approach.
(2) Defined the prior ( ) based on previous knowledge of parameter . This step
is a controversial step, and is attacked by conventional statistical scientists.
(3) Calculate posterior density and make estimation of parameters according to
the posterior distribution.
In the second step, if there is no previous knowledge to determine the prior
( ) of a parameter, Bayes suggested to assume uniform distribution to be its
distribution. This is called Bayesian assumption. Intuitionally, Bayesian
assumption is well accepted. Yet, it encounters problem when no information
about prior distribution is available, especially when parameter is infinite.
Empirical Bayes (EB) Estimator combines conventional statistical method and
Bayesian method, so that it applies conventional method to gain the marginal
density p(x), and then ascertains prior ( ) with the following formula.
+∞
= Ð
p x
( )
π θ
( )
p x
(
|
θ
)
d
θ
−∞
6.1.3 Applications of Bayesian network in data mining
1. Bayesian method in classification and regression analysis
Classification is to classify an object based on its feature vector and some
constrains. In data mining, we mainly study how to learn classification rules from
data or experiences. For classification, sometimes each feature vector correspond
to one class label (determinate classification); sometimes different classes can
overlap, where samples from different classes are very similar and we can only
tell the probabilities of a sample in all classes and choose a class for the sample
according to the probabilities. Bayesian School provides two methods to handle
this situation: one is selecting the class with maximum posterior probability; the
other is selecting the class with maximum utility function or minimum lost
function. Let feature vector be X
= (
x 1 , x 2 , …, x m ), and class vector be C = (
c 1 ,
c 2 , …, c l )
. Classification is to assign a class
c i
(
i (1,
…,l
) to a feature vector
X .
Search WWH ::




Custom Search