Graphics Reference
In-Depth Information
So the next question arises: to solve a maximum likelihood type problem, can we
analytically maximize the likelihood function? We have shown it can work with one
dimensional Bernoulli problems like the coin toss, and that it also works with one
dimensional Gaussian by finding the
parameters. To illustrate the latter case
let us assume that we have the samples 1, 4, 7, 9 obtained from a normal distribution
and we want to estimate the population mean for the sake of simplicity, that is, in
this simplistic case
μ
and
σ
θ = μ
. The maximum likelihood problem here is to choose a
specific value of
μ
and compute p
(
1
) ·
p
(
4
) ·
p
(
7
) ·
p
(
9
)
. Intuitively one can say
that this probability would be very small if we fix
μ =
10 and would be higher for
μ =
that produces the maximum product of combined
probabilities is what we call the maximum likelihood estimate of
4or
μ =
5. The value of
μ
.Again,
in our case the maximum likelihood estimate would constitute the sample mean
μ =
μ = θ
25 and adding the variance to the problem can be solved again using the
sample variance as the best estimator.
In real world data things are not that easy. We can have distribution that may
not be well behaved or have too many parameters making the actual solution com-
putationally too complex. Having a likelihood function made of a mixture of 100
100-dimensional Gaussians would yield 10,000 parameters and thus direct trial-error
maximization is not feasible. The way to deal with such complexity is to introduce
hidden variables in order to simplify the likelihood function and, in our case as well,
to account for MVs. The observed variables are those that can be directly measured
from the data, while hidden variables influence the data but are not trivial to measure.
An example of an observed variable would be if it is sunny today, whereas the hidden
variable can be P
5
.
(
|
)
.
Even simplifying with hidden variables does not allow us to reach the solution in
a single step. The most common approach in these cases would be to use an iterative
approach inwhichwe obtain some parameter estimates, we use a regression technique
to impute the values and repeat. However as the imputed values will depend on the
estimated parameters
sunny today
sunny yesterday
, they will not add any useful information to the process
and can be ignored. There are several techniques to obtain maximum likelihood
estimators. The most well known and simplistic is the EM algorithm presented in
the next section.
θ
4.4.1 Expectation-Maximization (EM)
In a nutshell the EM algorithm estimates the parameters of a probability distribution.
In our case this can be achieved from incomplete data. It iteratively maximizes
the likelihood of the complete data X obs considered as a function dependent of the
parameters [ 20 ].
That is, we want to model dependent random variables as the observed variable a
and the hidden variable b that generates a . We stated that a set of unknown parameters
θ
governs the probability distributions P θ (
a
)
, P θ (
b
)
. As an iterative process, the EM
 
Search WWH ::




Custom Search