Information Technology Reference
In-Depth Information
Raiffa and Schaifeer suggested using conjugate distributions as prior
distributions, where the posterior distribution and the corresponding prior
distribution are the same kind of distribution. The general description of
conjugate distribution is as follows:
Definition 6.7
Let the conditional distribution of samples x 1 , x 2 , …, x n under
parameters ȶ is p(x 1 , x 2 , …, x n | ȶ ). If the prior density function ʩ ( ȶ ) and its
resulting posterior density function ʩ ( ȶ |x) are in the same family, the prior
density function ʩ ( ȶ ) is said to be conjugate to the conditional distribution p(x| ȶ ).
Definition 6.8
Let P = {p(x| ȶ ): ȶ ʼn } be the density function family with
parameters ȶ . H = ʩ ( ȶ ) is the prior distribution family of ȶ . If for any given p P
and ʩ H, the resulting posterior distribution ʩ ( ȶ |x) is always in family H, H is
said to be the conjugate family to P
When the density functions of data distribution and its prior are all exponential
functions, the resulting function of their multiplication is the sample kind of
exponential function. The only difference is a factor of proportionality. So we
have:
Theorem 6.4
If for random variable Z, the kernel of its density function f(x) is
exponential function, the density function belongs to conjugate family.
All the distributions with exponential kernel function compose exponential
family, which includes binary distribution, multinomial distribution, normal
distribution, Gamma distribution, Poisson distribution and Dirichlet distribution.
Conjugate distributions can provide a reasonable synthesis of historical trials
and a reasonable precondition for future trials. The computation of non-conjugate
distribution is rather difficult. In contrast, the computation of conjugate
distribution is easy, where only multiplication with prior is required. So, in fact,
the conjugate family makes firm foundation for practical application of Bayesian
learning.
2. Principle of maximum entropy
Entropy is used to quantify the uncertainty of event in information theory. If a
random variable x takes two different possible value, namely a and b, comparing
the following two case:
(1)
p
(
x=a
) = 0.98,
p
(
x=b
) =0.02
Search WWH ::




Custom Search