Information Technology Reference
In-Depth Information
p
(
x
)
)=
x
D
(
p
q
p
(
x
)
log
q
(
x
)
We know from information theory [198] that entropic divergence
D
between any
two probability distributions is never negative. Moreover, it can be extended to con-
tinuous probability distributions in a natural way by setting (while keeping its non-
negativity property):
+
∞
p
(
x
)
D
(
p
q
)=
p
(
x
)
ln
dx
.
(
)
q
x
−
∞
On the basis of non-negativity of
D
, Table 4.8 shows that the normal distribution of
variance
2
has the maximum entropy in the class of the probability distributions
with the same variance.
σ
2
Ta b l e 4 . 8
The continuous entropy of distributions with variance
σ
reaches the maximum
2
value for the normal distribution of variance
σ
(
f
denotes any probability distribution of
2
)
variance σ
+
∞
f
(
x
)
N
(
x
)
x
2
2σ
e
−
1
√
2πσ
D
(
f
N
)=
f
(
x
)
ln
dx
N
(
x
)=
2
−
∞
+
∞
+
∞
=
f
(
x
)
ln
f
(
x
)
dx
−
f
(
x
)
ln
N
(
x
)
dx
−
∞
−
∞
x
2
2σ
+
∞
+
∞
e
−
2
√
2πσ
=
f
(
x
)
ln
f
(
x
)
dx
−
f
(
x
)
ln
2
dx
−
∞
−
∞
+
∞
x
2
2σ
f
(
x
)
ln
e
−
=
−
S
(
f
)
−
2
dx
+
−
∞
2
+
∞
−
∞
ln
2
πσ
f
(
x
)
dx
+
∞
1
2
ln 2πσ
1
2σ
f
(
x
)
x
2
dx
+
2
=
−
S
(
f
)+
·
1
2
−
∞
1
2σ
2
ln
2
2
=
−
S
(
f
)+
2
Var
(
f
)+
(
2
πσ
)
Var
(
f
)
≤
σ
2
+
2
ln
(
2πσ
2
≤−
S
(
f
)+
)
1
2
(
ln
e
+
ln
(
2πσ
2
=
−
S
(
f
)+
))
1
2
ln
(
2π
e
σ
2
D
(
f
n
)
≤−
S
(
f
)+
)
but
D
(
f
N
)
≥
0
therefore
S
(
f
)
≤
2
ln
(
2π
e
σ
2
)