Information Technology Reference
In-Depth Information
are often combined. Several common methods for prior distribution selection are
listed in the following. Before we discuss these methods, we give some
definitions first.
Let
ȶ be the parameter of a model, X = (
x n ) be observed data, ( ȶ )
be the prior distribution of ȶ . ( ȶ ) represents the brief of parameter when no
evidence exists.
x
1 ,
x
2 , …,
x n | ȶ ) is likelihood function. It
represents the brief of unknown data when parameter ȶ
l
(
x
1 ,
x
2 , …,
x
n | ȶ )
p
(
x
1 ,
x
2 , …,
is known.
h
( ȶ |
x
1,
x
2 , …,
x
n )
p
x n ) is the brief of parameter after new evidence appears.
Bayesian theorem describes the relation of them
( ȶ |
x
1 ,
x
2 , …,
π
(
θ
)
p
(
x
,
x
,
?
x
|
θ
)
1
2
n
h
(
θ
|
x
,
x
,
?
x
)
= Ð
π
(
θ
)
l
(
x
,
x
,
?
x
|
θ
)
1
2
n
1
2
n
π
(
θ
)
p
(
x
,
x
,
?
x
|
θ
)
d
θ
1
2
n
(6.6)
Definition 6.5
Kernel of Distribution Density: If f(x), the distribution density of
random variable z, can be decomposed as f(x), = cg(x), where c is a constant
independent of x, we call g(x) the kernel of f(x), shortly f(x) g(x). If we know
the kernel of distribution density, we can determine corresponding constant
according to the fact that the integral of distribution density in the whole space is
1. Therefore, the key of solving distribution density of a random variable is to
solve the kernel of its distribution density.
Definition 6.6
Sufficient Statistic: To parameter ȶ , the statistic t(x 1 , x 2 , …, x n ) is
sufficient if the posterior distribution of ȶ , h( ȶ |x 1 , x 2 , …, x n ), is always a function
of ȶ and t(x 1 , x 2 , …, x n ) in spite of its prior distribution.
This definition clearly states that the information of in data can be represented
by its sufficient statistics. Sufficient statistics are connections between posterior
distribution and data. Below, we give out a theorem to judge whether a statistic is
sufficient.
Theorem 6.3 The Neyman-Fisher Factorization Theorem: Let f (x) be the
density or mass function for the random vector x, parametrized by the vector .
The statistic t = T(x) is sufficient for if and only if there exist functions a(x)
(not depending on ) and b (t) such that f (x) = a(x) b (t) for all possible
values of x.
6.3.1 Common methods for prior distribution selection
1. Conjugate family of distributions
Search WWH ::




Custom Search