Graphics Reference
In-Depth Information
Rubin's rules to obtain an overall set of estimated coefficients and standard errors
proceed as follows. Let
R
denote the estimation of interest and
U
its estimated
variance,
R
being either an estimated regression coefficient or a kernel parameter
of a SVM, whatever applies. Once the MIs have been obtained, we will have
R
1
,
R
2
,...,
R
m
estimates and their respective variances
U
1
,
U
2
,...,
U
m
. The overall
estimate, occasionally called the MI estimate is given by
m
1
m
1
R
i
.
R
=
(4.16)
i
=
The variance for the estimate has two components: the variability within each
data set and across data sets. The within imputation variance is simply the average
of the estimated variances:
m
1
m
U
=
U
i
,
(4.17)
i
=
1
whereas the between imputation variance is the sample variance of the proper esti-
mates:
m
1
1
(
R
i
−
2
B
=
R
)
.
(4.18)
m
−
1
i
=
The total variance
T
is the corrected sum of these two components with a factor that
accounts for the simulation error in
R
,
1
B
1
m
=
U
T
+
+
.
(4.19)
The square root of
T
is the overall standard error associated to
R
. In the case of no
MVs being present i
n t
he original data set, all
R
1
,
R
2
,...,
R
m
would be the same,
then
B
U
. The magnitude of
B
with respect to
U
indicates how much
information is contained in the missing portion of the data set relative to the observed
part.
In [
83
] the authors elaborate more on the confidence intervals extract
ed
from
R
and how to test the null hypothesis of
R
=
0 and
T
=
R
=
0 by comparing the ratio
√
T
with a
Student's
t
-distribution with degrees of freedom
1
2
mU
df
=
(
m
−
1
)
+
,
(4.20)
(
m
+
1
)
B
in the case the readers would like to further their knowledge on how to use this
hypothesis to check whether the number of MI
m
was large enough.