Information Technology Reference
In-Depth Information
A.2 Kullback-Leibler Divergence
The Kullback-Leibler (KL) divergence (also called relative entropy) is a dis-
crepancy measure between two probability distributions p and q .Fordiscrete
distributions, with p and q representing PMFs, the KL divergence is denoted
and defined as:
q ( x )) =
x
p ( x )ln p ( x )
q ( x )
D KL ( p
||
q )
D KL ( p ( x )
||
.
(A.4)
For continuous distributions p and q represent densities and the summation
is substituted by an integral.
We may also write D KL ( p
||
q ) as
p ( x )ln q ( x )+
x
D KL ( p
||
q )=
p ( x )ln p ( x )
(A.5)
x
= H S ( p, q )
H S ( p ) ,
where H S ( p ) is the (Shannon) entropy of the distribution p and H S ( p, q ) is
the cross-entropy between p and q .
Note that the KL divergence is not a metric distance because it does
not satisfy the symmetry property, D KL ( p
p ), nor the triangle
inequality. However, it has some interesting properties, namely D KL ( p
||
q )
= D KL ( q
||
||
q )
0 ,
p ( x ) ,q ( x ),and D KL ( p
||
q )=0iff p ( x )= q ( x ) ,
x .
From (A.4) we observe that
D KL ( p||q )= E p ln p ( x )
q ( x )
,
(A.6)
with
E p denoting an expectation relative to the distribution p .Wearethen
able to compute the empirical estimate (resubstitution estimate)
n
q )= 1
n
ln p ( x i )
q ( x i )
D KL ( p
||
.
(A.7)
i =1
D KL ( p
Since
q ) is an empirical measure of the discrepancy between p ( x ) and
q ( x ) its minimization can be applied to finding a distribution q approximating
another distribution p .
||
A.3 Equivalence of ML and KL Empirical Estimates
Let us assume we use (A.7), instead of the ML method, for the estimation of
the parameter vector θ mentioned in Sect. A.1. The distribution of the i.i.d.
x i is p ( x ; θ 0 ) with θ 0 unknown. We estimate it by attempting to find
 
Search WWH ::




Custom Search