Information Technology Reference
In-Depth Information
Appendix A
Maximum Likelihood and
Kullback-Leibler Divergence
A.1 Maximum Likelihood
Let us consider a set
D
n
=
{x
i
;
i
=1
, ..., n}
of
n
observations of a
random variable
X
distributed according to a PDF belonging to a family
{p
(
x
;
θ
)
}
with unknown parameter vector
θ ∈ Θ
. The maximum likelihood
(ML) method provides an estimate of
θ
that best supports the observed set
D
n
in the sense of maximizing the likelihood
p
(
D
n
|
θ
),whichforany
θ
is given
by the joint density value
p
(
D
n
|
θ
)=
p
(
x
1
,x
2
,...,x
n
|
θ
)
.
(A.1)
Let us assume that the observations are i.i.d. realizations of
p
(
x
;
θ
).The
likelihood can then be written as
n
p
(
D
n
|
θ
)=
p
(
x
1
,x
2
,...,x
n
|
θ
)=
p
(
x
i
;
θ
)
.
(A.2)
i
=1
Note that
p
(
D
n
|
θ
) is a function of the parameter vector
θ
and, as a function
of
θ
,itis
not
a probability density function (for instance its integral may
differ from 1). Since the logarithm function is monotonic and given the ex-
ponential form of many common distributions, it is usually more convenient
to maximize the log-likelihood instead of (A.2). We then search for
n
θ
=argmax
θ∈Θ
L
(
θ|D
n
)
,
with
L
(
θ|D
n
)=
ln
p
(
x
i
;
θ
)
.
(A.3)
i
=1
For discrete distributions one can still apply formula (A.3) interpreting the
p
(
x
i
;
θ
) as PMF values. The method is quite appellative and has excellent
mathematical properties especially for large
n
(see e.g., [136]).