Information Technology Reference
In-Depth Information
Appendix A
Maximum Likelihood and
Kullback-Leibler Divergence
A.1 Maximum Likelihood
Let us consider a set D n = {x i ; i =1 , ..., n} of n observations of a
random variable X distributed according to a PDF belonging to a family
{p ( x ; θ ) } with unknown parameter vector θ ∈ Θ . The maximum likelihood
(ML) method provides an estimate of θ that best supports the observed set
D n in the sense of maximizing the likelihood p ( D n |
θ ),whichforany θ is given
by the joint density value
p ( D n |
θ )= p ( x 1 ,x 2 ,...,x n |
θ ) .
(A.1)
Let us assume that the observations are i.i.d. realizations of p ( x ; θ ).The
likelihood can then be written as
n
p ( D n |
θ )= p ( x 1 ,x 2 ,...,x n |
θ )=
p ( x i ; θ ) .
(A.2)
i =1
Note that p ( D n |
θ ) is a function of the parameter vector θ and, as a function
of θ ,itis not a probability density function (for instance its integral may
differ from 1). Since the logarithm function is monotonic and given the ex-
ponential form of many common distributions, it is usually more convenient
to maximize the log-likelihood instead of (A.2). We then search for
n
θ =argmax
θ∈Θ L ( θ|D n ) , with
L ( θ|D n )=
ln p ( x i ; θ ) .
(A.3)
i =1
For discrete distributions one can still apply formula (A.3) interpreting the
p ( x i ; θ ) as PMF values. The method is quite appellative and has excellent
mathematical properties especially for large n (see e.g., [136]).
Search WWH ::




Custom Search