Information Technology Reference
In-Depth Information
In the case of single variable multiple parameters (a single variable with
multiple possible states), commonly
is regarded as a continuous variable with
Gaussian distribution. Assume its physical density is
X
p
(
x
| ȶ ), then we have:
2
2
2
1 2
/
e
(
x
µ
)
/
2
ν
p x
(
|
θ
)
=
(
πν
)
where ȶ = { , }.
Similar to previous approach on binary distribution, we firstly assign the
prior of parameters and then solve the posterior with the data
D
={
X 1 =x 1 ,
X 2 =x 2 ,…., X N =x
N } via Bayesian theorem.
P
( |
D
)=
p
(
D
| ȶ )
p
( ȶ )/
p
(D)
Next, we use the mean of as the prediction:
p x
(
|
D
)
( ȶ | D)d ȶ (6.17)
For exponential family, the computation is effective and close. In the case of
multisamples, if the observed value of
=
p
(
x N +1 | )
p
N
+
!
is discrete, Dirichlet distribution can be
used as the prior distribution, which can simplify the computation.
The computational learning mechanism of Bayesian theorem is to get the
weighted average of the expectation of prior distribution and the mean of sample,
where the higher the precision, the bigger the weight. Under the precondition that
the prior is conjugate distribution, posterior information can be used as the prior
in the next round computation, so that it can be integrated with further obtained
sample information. If this process is repeated time after time, the effect of
sample will be increasingly prominent. Because Bayesian method integrates prior
information and posterior information, it can both avoid the subjective bias when
using only prior information and avoid numerous blind searching and
computation when sample information is limited. Besides, it can also avoid the
affect of noise when utilizing only posterior information. Therefore it is suitable
for problems of data mining with statistical features and problems of knowledge
discovery, especially the problems where sample is hard to collect or the cost of
collecting sample is high. The key of effective learning with Bayesian method is
determining prior reasonably and precisely. Currently, there are only some
principles for prior determination, and there is no operable whole theory to
determine priors. In many cases the reasonability and precision of prior
distribution is hardly to evaluate. Further research is required to solve these
problems.
X
6.4 Naïve Bayesian Learning Model
In naïve Bayesian learning models, training sample
I
is decomposed into feature
vector X and decision class variable
. Here, it is assumed that all the weights in
a feature vector are independently given the decision variable. In another word,
C
Search WWH ::




Custom Search