Graphics Reference
In-Depth Information
Recently a software package has been published in which the MCAR condition can
be tested [ 43 ].
A third case arises when MAR does not apply as the MV depends on both the rest
of observed values and the proper value itself. That is
P
(
B
|
X obs ,
X mis ,ΞΆ)
(4.4)
is the actual probability estimation. This model is usually called not missing at
random (NMAR) or missing not at random (MNAR) in the literature. This model of
missingness is a challenge for the user as the only way to obtain an unbiased estimate
is to model the missingness as well. This is a very complex task in which we should
create a model accounting for the missing data that should be later incorporated to
a more complex model used to estimate the MVs. However, even when we cannot
account for the missingness model, the introduced bias may be still small enough.
In [ 23 ] the reader can find an example of how to perform this.
4.3 Simple Approaches to Missing Data
In this section we introduce the most simplistic methods used to deal with MVs. As
they are very simple, they usually do not take into account themissingnessmechanism
and they blindly perform the operation.
The most simple approach is to do not impute (DNI). As its name indicates,
all the MVs remain unreplaced, so the DM algorithm must use their default MVs
strategies if present. Often the objective is to verify whether imputation methods
allow the classification methods to perform better than when using the original data
sets. As a guideline, in [ 37 ] a previous study of imputation methods is presented. As
an alternative for these learning methods that cannot deal with explicit MVs notation
(as a special value for instance) another approach is to convert the MVs to a new
value (encode them into a new numerical value), but such a simplistic method has
been shown to lead to serious inference problems [ 82 ].
A very common approach in the specialized literature, even nowadays, is to apply
case deletion or ignore missing (IM). Using this method, all instances with at least
one MV are discarded from the data set. Although IM often results in a substantial
decrease in the sample size available for the analysis, it does have important advan-
tages. In particular, under the assumption that data is MCAR, it leads to unbiased
parameter estimates. Unfortunately, even when the data are MCAR there is a loss in
power using this approach, especially if we have to rule out a large number of sub-
jects. And when the data is not MCAR, it biases the results. For example when low
income individuals are less likely to report their income level, the resulting mean
is biased in favor of higher incomes. The alternative approaches discussed below
should be considered as a replacement for IM.
Often seen as a good choice, the substitution of the MVs for the global most
common attribute value for nominal attributes, and global average value for numerical
 
Search WWH ::




Custom Search