Digital Signal Processing Reference
In-Depth Information
M
q
i
n
i
!
P
M
(
n
1
,
n
2
,...,
n
M
/
q
1
,...,
q
M
)
=
N
!
with
q
i
priors
,
i
=
1
M
n
i
N
n
i
=
N
and
p
i
=
(9.1)
i
=
1
·
√
2
n
n
e
−
n
Stirling formula gives
n
!≈
·
π
·
n
when
n
→+∞
. We could then
observe that it converges to discrete version of Kullback-Leibler:
p
i
log
p
i
q
i
M
1
N
Lim
log [
P
M
]
=
=
K
(
p
,
q
)
(9.2)
N
→+∞
i
=
1
Based on variational approach, Donsker and Varadhan gave variational definition of
Kullback divergence:
Sup
E
p
(φ)
−
e
φ
)
K
(
p
,
q
)
=
log
E
q
(
(9.3)
Consider:
log
p
(ω)
φ(ω)
=
q
(ω)
log
ω
log
p
log
E
q
(
e
φ
)
=
(ω)
p
(ω)
⇒
E
p
(φ)
−
p
(ω)
−
q
(ω)
q
(ω)
q
(ω)
ω
=
K
(
p
,
q
)
−
log
(
1
)
=
K
(
p
,
q
)
This proves that the supremum over all
φ
is no smaller than the divergence.
E
p
log
e
φ
E
q
(
log
q
φ
(ω)
q
log
E
q
(
e
φ
)
=
E
p
(φ)
−
=
p
(ω)
e
φ
)
(ω)
ω
)
−
E
p
(φ)
−
log
E
q
(
e
φ
)
=
ω
e
φ(ω)
q
(ω)
with
q
φ
(ω)
=
e
φ(θ)
⇒
K
(
p
,
q
p
(ω)
θ
q
(θ)
log
p
(ω)
q
φ
(ω)
0 using the divergence inequality.
In the same context, link with “Large Deviation Theory” and Fenchel-Legendre
transform which gives that logarithm of generating function are dual to Kullback
Divergence. This relation is given by:
≥