Information Technology Reference
In-Depth Information
(2)
p
(
x=a
) = 0.45,
p
(
x=b
) =0.55
Obviously, the uncertainty of case 1 is much less than that of case 2. Intuitively,
we can see that the uncertainty will reach maximum when the probabilities of
two values are equal.
Definition 6.9 Let x be a discrete random variable. It takes at most countable values
a 1 , a 2 , …, a k , …, and p(x=a i ) = p i i = 1, 2, …. The entropy of x is
Ã
H
(
x
)
=
p
ln
p
.
For
continuous
random
variable
x,
if
the
integral
i
i
i
Ð
H
(
x
)
=
p
(
x
)
ln
p
(
x
)
is meaningful, where p(x) is the density of variable x, the
integral is called the entropy of a continuous random variable.
According to the definition, when two random variables have same distribution,
they have equal entropy. So entropy is only related to distribution.
Principle of maximum entropy: For non-information data, the best prior
distribution is the distribution, which makes the entropy maximum under
parameters .
It can be proved that the entropy of a random variable, or vector, reaches
maximum if and only if its distribution is uniform. Hence, the Bayesian
assumption, which assumes non-information prior to be uniform, fits the
principle of maximum entropy. It makes the entropy of a random variable, or
vector, maximum. Below is the proof of the case of limited valued random
variable.
Theorem 6.5
Let random variable x take limited values a 1 , a 2 , …, a n . The
corresponding probabilities are p 1 , p 2 , …, p n . The entropy H(x) is maximum if
and only if p 1 = p 2 = … = p n =1/n.
n
n
p n )= - Ã =
+ Ã =
p
ln
p
λ
p
1
Proof Consider G(
p
1 ,
p
2 , …,
. To find its
i
i
i
1
i
1
maximum, we let partial derivative of G with respective to
p i to be 0, and get the
equations:
G
0=
= -ln p i-1 + (
i
=1, 2, …,
n
)
p
n
Solve the equations, we get p 1 = p 2 = … = p n . Because à =
p
=
1 , we have p 1 =
i
1
Search WWH ::




Custom Search