Dealing with Missing Values - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

So the next question arises: to solve a maximum likelihood type problem, can we

analytically maximize the likelihood function? We have shown it can work with one

dimensional Bernoulli problems like the coin toss, and that it also works with one

dimensional Gaussian by finding the

parameters. To illustrate the latter case

let us assume that we have the samples 1, 4, 7, 9 obtained from a normal distribution

and we want to estimate the population mean for the sake of simplicity, that is, in

this simplistic case

and

θ = μ

. The maximum likelihood problem here is to choose a

specific value of

and compute p

(

) ·

(

) ·

(

) ·

(

)

. Intuitively one can say

that this probability would be very small if we fix

μ =

10 and would be higher for

μ =

that produces the maximum product of combined

probabilities is what we call the maximum likelihood estimate of

4or

μ =

5. The value of

.Again,

in our case the maximum likelihood estimate would constitute the sample mean

μ =

μ = θ

25 and adding the variance to the problem can be solved again using the

sample variance as the best estimator.

In real world data things are not that easy. We can have distribution that may

not be well behaved or have too many parameters making the actual solution com-

putationally too complex. Having a likelihood function made of a mixture of 100

100-dimensional Gaussians would yield 10,000 parameters and thus direct trial-error

maximization is not feasible. The way to deal with such complexity is to introduce

hidden variables in order to simplify the likelihood function and, in our case as well,

to account for MVs. The observed variables are those that can be directly measured

from the data, while hidden variables influence the data but are not trivial to measure.

An example of an observed variable would be if it is sunny today, whereas the hidden

variable can be P

(

)

Even simplifying with hidden variables does not allow us to reach the solution in

a single step. The most common approach in these cases would be to use an iterative

approach inwhichwe obtain some parameter estimates, we use a regression technique

to impute the values and repeat. However as the imputed values will depend on the

estimated parameters

sunny today

sunny yesterday

, they will not add any useful information to the process

and can be ignored. There are several techniques to obtain maximum likelihood

estimators. The most well known and simplistic is the EM algorithm presented in

the next section.

4.4.1 Expectation-Maximization (EM)

In a nutshell the EM algorithm estimates the parameters of a probability distribution.

In our case this can be achieved from incomplete data. It iteratively maximizes

the likelihood of the complete data X obs considered as a function dependent of the

parameters [ 20 ].

That is, we want to model dependent random variables as the observed variable a

and the hidden variable b that generates a . We stated that a set of unknown parameters

governs the probability distributions P θ (

)

, P θ (

)

. As an iterative process, the EM

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home