Information Technology Reference
In-Depth Information
(2)
p
(
x=a
) = 0.45,
p
(
x=b
) =0.55
Obviously, the uncertainty of case 1 is much less than that of case 2. Intuitively,
we can see that the uncertainty will reach maximum when the probabilities of
two values are equal.
Definition 6.9
Let x be a discrete random variable. It takes at most countable values
a
1
, a
2
, …, a
k
, …, and p(x=a
i
) = p
i
i = 1, 2, …. The entropy of x is
Ã
H
(
x
)
=
−
p
ln
p
.
For
continuous
random
variable
x,
if
the
integral
i
i
i
Ð
H
(
x
)
=
−
p
(
x
)
ln
p
(
x
)
is meaningful, where p(x) is the density of variable x, the
integral is called the entropy of a continuous random variable.
According to the definition, when two random variables have same distribution,
they have equal entropy. So entropy is only related to distribution.
Principle of maximum entropy: For non-information data, the best prior
distribution is the distribution, which makes the entropy maximum under
parameters
.
It can be proved that the entropy of a random variable, or vector, reaches
maximum if and only if its distribution is uniform. Hence, the Bayesian
assumption, which assumes non-information prior to be uniform, fits the
principle of maximum entropy. It makes the entropy of a random variable, or
vector, maximum. Below is the proof of the case of limited valued random
variable.
Theorem 6.5
Let random variable x take limited values a
1
, a
2
, …, a
n
. The
corresponding probabilities are p
1
, p
2
, …, p
n
. The entropy H(x) is maximum if
and only if p
1
= p
2
= … = p
n
=1/n.
n
n
p
n
)= -
Ã
=
+
Ã
=
p
ln
p
λ
p
−
1
Proof
Consider G(
p
1
,
p
2
, …,
. To find its
i
i
i
1
i
1
maximum, we let partial derivative of G with respective to
p
i
to be 0, and get the
equations:
∂
G
∂
0=
= -ln p
i-1
+
(
i
=1, 2, …,
n
)
p
n
Solve the equations, we get p
1
= p
2
= … = p
n
. Because
Ã
=
p
=
1
, we have
p
1
=
i
1