Information Technology Reference
In-Depth Information
A.2 Kullback-Leibler Divergence
The Kullback-Leibler (KL) divergence (also called relative entropy) is a dis-
crepancy measure between two probability distributions
p
and
q
.Fordiscrete
distributions, with
p
and
q
representing PMFs, the KL divergence is denoted
and defined as:
q
(
x
)) =
x
p
(
x
)ln
p
(
x
)
q
(
x
)
D
KL
(
p
||
q
)
≡
D
KL
(
p
(
x
)
||
.
(A.4)
For continuous distributions
p
and
q
represent densities and the summation
is substituted by an integral.
We may also write
D
KL
(
p
||
q
) as
p
(
x
)ln
q
(
x
)+
x
D
KL
(
p
||
q
)=
−
p
(
x
)ln
p
(
x
)
(A.5)
x
=
H
S
(
p, q
)
−
H
S
(
p
)
,
where
H
S
(
p
) is the (Shannon) entropy of the distribution
p
and
H
S
(
p, q
) is
the cross-entropy between
p
and
q
.
Note that the KL divergence is not a metric distance because it does
not satisfy the symmetry property,
D
KL
(
p
p
), nor the triangle
inequality. However, it has some interesting properties, namely
D
KL
(
p
||
q
)
=
D
KL
(
q
||
||
q
)
≥
0
,
∀
p
(
x
)
,q
(
x
),and
D
KL
(
p
||
q
)=0iff
p
(
x
)=
q
(
x
)
,
∀
x
.
From (A.4) we observe that
D
KL
(
p||q
)=
E
p
ln
p
(
x
)
q
(
x
)
,
(A.6)
with
E
p
denoting an expectation relative to the distribution
p
.Wearethen
able to compute the empirical estimate (resubstitution estimate)
n
q
)=
1
n
ln
p
(
x
i
)
q
(
x
i
)
D
KL
(
p
||
.
(A.7)
i
=1
D
KL
(
p
Since
q
) is an empirical measure of the discrepancy between
p
(
x
) and
q
(
x
) its minimization can be applied to finding a distribution
q
approximating
another distribution
p
.
||
A.3 Equivalence of ML and KL Empirical Estimates
Let us assume we use (A.7), instead of the ML method, for the estimation of
the parameter vector
θ
mentioned in Sect. A.1. The distribution of the i.i.d.
x
i
is
p
(
x
;
θ
0
) with
θ
0
unknown. We estimate it by attempting to find