Maximum Likelihood and Kullback-Leibler Divergence - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

A.2 Kullback-Leibler Divergence

The Kullback-Leibler (KL) divergence (also called relative entropy) is a dis-

crepancy measure between two probability distributions p and q .Fordiscrete

distributions, with p and q representing PMFs, the KL divergence is denoted

and defined as:

q ( x )) =

p ( x )ln p ( x )

q ( x )

D KL ( p

q )

≡

D KL ( p ( x )

(A.4)

For continuous distributions p and q represent densities and the summation

is substituted by an integral.

We may also write D KL ( p

q ) as

p ( x )ln q ( x )+

D KL ( p

q )=

−

p ( x )ln p ( x )

(A.5)

= H S ( p, q )

−

H S ( p ) ,

where H S ( p ) is the (Shannon) entropy of the distribution p and H S ( p, q ) is

the cross-entropy between p and q .

Note that the KL divergence is not a metric distance because it does

not satisfy the symmetry property, D KL ( p

p ), nor the triangle

inequality. However, it has some interesting properties, namely D KL ( p

q )

= D KL ( q

q )

≥

0 ,

∀

p ( x ) ,q ( x ),and D KL ( p

q )=0iff p ( x )= q ( x ) ,

∀

x .

From (A.4) we observe that

D KL ( p||q )= E p ln p ( x )

q ( x )

(A.6)

with

E p denoting an expectation relative to the distribution p .Wearethen

able to compute the empirical estimate (resubstitution estimate)

q )= 1

ln p ( x i )

q ( x i )

D KL ( p

(A.7)

i =1

D KL ( p

Since

q ) is an empirical measure of the discrepancy between p ( x ) and

q ( x ) its minimization can be applied to finding a distribution q approximating

another distribution p .

A.3 Equivalence of ML and KL Empirical Estimates

Let us assume we use (A.7), instead of the ML method, for the estimation of

the parameter vector θ mentioned in Sect. A.1. The distribution of the i.i.d.

x i is p ( x ; θ 0 ) with θ 0 unknown. We estimate it by attempting to find

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home