Maximum Likelihood and Kullback-Leibler Divergence - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Appendix A

Maximum Likelihood and

Kullback-Leibler Divergence

A.1 Maximum Likelihood

Let us consider a set D n = {x i ; i =1 , ..., n} of n observations of a

random variable X distributed according to a PDF belonging to a family

{p ( x ; θ ) } with unknown parameter vector θ ∈ Θ . The maximum likelihood

(ML) method provides an estimate of θ that best supports the observed set

D n in the sense of maximizing the likelihood p ( D n |

θ ),whichforany θ is given

by the joint density value

p ( D n |

θ )= p ( x 1 ,x 2 ,...,x n |

θ ) .

(A.1)

Let us assume that the observations are i.i.d. realizations of p ( x ; θ ).The

likelihood can then be written as

n

p ( D n |

θ )= p ( x 1 ,x 2 ,...,x n |

θ )=

p ( x i ; θ ) .

(A.2)

i =1

Note that p ( D n |

θ ) is a function of the parameter vector θ and, as a function

of θ ,itis not a probability density function (for instance its integral may

differ from 1). Since the logarithm function is monotonic and given the ex-

ponential form of many common distributions, it is usually more convenient

to maximize the log-likelihood instead of (A.2). We then search for

n

θ =argmax

θ∈Θ L ( θ|D n ) , with

L ( θ|D n )=

ln p ( x i ; θ ) .

(A.3)

i =1

For discrete distributions one can still apply formula (A.3) interpreting the

p ( x i ; θ ) as PMF values. The method is quite appellative and has excellent

mathematical properties especially for large n (see e.g., [136]).

Search WWH ::

Custom Search

Home