Graphics Reference
In-Depth Information
n
S ( t )
i
( t + 1 ) ( t + 1 )
1
˜
Σ ( t + 1 ) =
,
(4.13)
n
i
=
1
X i , the conditional expectation S ( t i of the cross-products
is composed of three parts. The two parts that involve the available values in the
instance,
where, for each instance x
=
x obs x obs |
x obs ; μ ( t ) , Σ ( t ) ) =
x obs x obs
E
(
(4.14)
and
x mis x mis |
x obs ; μ ( t ) , Σ ( t ) ) =
x mis
x mis + C
E
(
,
(4.15)
is the sum of the cross-product of the imputed values and the residual covariance
matrix C
x obs ; μ ( t ) , Σ ( t ) )
, the conditional covariance matrix of
the imputation error. The normalization constant
=
Cov
(
x miss ,
x mis |
n of the covariance matrix estimate
[Eq. ( 4.13 )] is the number of degrees of freedom of the sample covariance matrix of
the completed data set.
The first estimation of the mean and covariance matrix needs to rely on a com-
pletely observed data set. One solution in [ 85 ] is to fill the data set with the initial
estimates of the mean and covariance matrices. The process ends when the estimates
of the mean and covariance matrix do not change over a predefined threshold. Please
note that this EM approach is only well suited for numeric data sets, constituting a
limitation for the application of EM, although an extension for mixed numerical and
nominal attributes can be found in [ 82 ].
The EM algorithm is still valid nowadays, but is usually part of a system in which
it helps to evolve some distributions like GTM neural networks in [ 95 ]. Still some
research is being carried out for EM algorithms in which its limitations are being
improved and also are applied to new fields like semi-supervised learning [ 97 ]. The
most well known version of the EM for real valued data sets is the one introduced
in [ 85 ] where the basic EM algorithm presented is extended with a regularization
parameter.
˜
4.4.2 Multiple Imputation
One big problem of the maximum likelihood methods like EM is that they tend
to underestimate the inherent errors produced by the estimation process, formally
standard errors. TheMultiple Imputation (MI) approachwas designed to take this into
account to be a less biased imputation method, at the cost of being computationally
expensive. MI is a Monte Carlo approach described very well by [ 80 ] in which we
generate multiple imputed values from the observed data in a very similar way to
the EM algorithm: it fills the incomplete data by repeatedly solving the observed-
 
Search WWH ::




Custom Search