Graphics Reference
In-Depth Information
not talked about any imputation yet. The reason is EM is a meta algorithm that it is
adapted to a particular application.
To use EM for imputation first we need to choose a plausible set of parameters,
that is, we need to assume that the data follows a probability distribution, which is
usually seen as a drawback of these kind of methods. The EM algorithmworks better
with probability distributions that are easy to maximize, as Gaussian mixture models.
In [ 85 ] an approach of EM using multivariate Gaussian is proposed as using multi-
variate Gaussian data can be parameterized by the mean and the covariance matrix.
In each iteration of the EM algorithm for imputation the estimates of the mean
μ
and the covariance
are represented by a matrix and revised in three phases. These
parameters are used to apply a regression over the MVs by using the complete data.
In the first one in each instance with MVs the regression parameters B for the MVs
are calculated from the current estimates of the mean and covariance matrix and the
available complete data. Next theMVs are imputedwith their conditional expectation
values from the available complete ones and the estimated regression coefficients
Σ
x mis = μ mis + (
x obs μ obs )
B
+
e
,
(4.9)
where the instance x of n attributes is separated into the observed values x obs and
the missing ones x mis . The mean and covariance matrix are also separated in such a
way. The residual e
n mis is assumed to be a random vector with mean zero and
unknown covariance matrix. These two phases would complete the E-step. Please
note that for the iteration of the algorithm the imputation is not strictly needed as
only the estimates of the mean and covariance matrix are, as well as the regression
parameters. But our ultimate goal is to have our data set filled, so we use the latest
regression parameters to create the best imputed values so far.
In the third phase the M-step is completed by re-estimating the mean a covari-
ance matrix. The mean is taken as the sample mean of the completed data set and
the covariance is the sample covariance matrix and the covariance matrices of the
imputation errors as shown in [ 54 ]. That is:
1
×
∈ R
B
= Σ 1
obs
obs Σ obs , mis ,
and
(4.10)
,
C
= Σ mis , mis Σ mis , obs Σ 1
obs , obs Σ obs , mis
(4.11)
The hat accent A designates an estimate of a quantity A . After updating B and C the
mean and covariance matrix must be updated with
n
1
n
μ ( t + 1 ) =
X i
(4.12)
i = 1
and
 
Search WWH ::




Custom Search