Information Technology Reference
In-Depth Information
6.3.2 Computational learning
Learning is that a system can improve its behavior after running. Is the posterior
distribution gained via Bayesian formula better than its corresponding prior?
What is its learning mechanism? Here we analyze normal distribution as an
example to study the effect of prior information and sample data by changing
parameters.
Let
σ
2
σ
2
x 1 , x 2 , … , x n be a sample from normal distribution N( ȶ ,
), where
~ , the estimation of ȶ , we take another
normal distribution as the prior of ȶ . That is
( )=
is known and ȶ is unknown. To seek θ
2
σ ).
The resulting posterior distribution of ȶ is also normal distribution:
h
N
( 0 ,
2
x
d
( |
) = N( 1 ,
)
1
where
à =
x
1
n
1
n
1
n
n
α
=
(
µ
+
x
)
/(
+
)
d
2
1
=
(
+
)
1
x
=
i
n
1
0
1
1
2
0
2
1
2
0
2
1
2
0
2
1
σ
σ
σ
σ
σ
σ
i
1
x
Take 1 , the expectation of the posterior
h
( |
) as the estimation of
ȶ , we have:
1
n
σ
~
θ
2
1
x
(
µ
+
x
)
d
=E( |
)=
(6.8)
1
0
1
2
0
2
1
σ
~ , the estimation of ȶ , is the weighted average of 0 , the
expectation of prior, and
θ
Therefore,
2
0
2
x
σ
σ
, the sample mean.
is the variance of
N
( 0 ,
),
1
2
2
σ
σ
so its reciprocal, 1/
, is the precision of 0 . Similarly,
/
n
is the variance of
x
sample mean
x
, so its reciprocal is the precision of
. Hence, we see that
1
~ is the weighted average of 0 and
θ
x
, where the weights are their precisions
respectively. The smaller the variance, the bigger the weight. Besides, the bigger
the sample size
1
2
σ
, or the bigger the weight of
sample mean. This means that when n is quite large, the effect of prior mean will
be very small. Above analysis illustrate that the posterior from Bayesian formula
integrates the prior information and sample data. The result is more reasonable
than that based on merely prior information or sample data. The learning
mechanism is effective. The analysis based on other conjugate prior distribution
leads to similar result.
According to previous discussion, with the conjugate prior, we can use the
posterior information as the prior of next computation and seek the next posterior
by integrating more sample information. If we repeat this process time after time,
n
, the smaller the variance
/
n
Search WWH ::




Custom Search