Graphics Reference
In-Depth Information
Chapter 4
Dealing with Missing Values
Abstract In this chapter the reader is introduced to the approaches used in the
literature to tackle the presence of Missing Values (MVs). In real-life data, informa-
tion is frequently lost in data mining, caused by the presence of missing values in
attributes. Several schemes have been studied to overcome the drawbacks produced
by missing values in data mining tasks; one of the most well known is based on
preprocessing, formally known as imputation. After the introduction in Sect. 4.1 ,the
chapter begins with the theoretical background which analyzes the underlying dis-
tribution of the missingness in Sect. 4.2 . From this point on, the successive sections
go from the simplest approaches in Sect. 4.3 , to the most advanced proposals, focus-
ing in the imputation of the MVs. The scope of such advanced methods includes the
classicmaximum likelihood procedures, like Expectation-Maximization orMultiple-
Imputation (Sect. 4.4 ) and the latest Machine Learning based approaches which use
algorithms for classification or regression in order to accomplish the imputation
(Sect. 4.5 ). Finally a comparative experimental study will be carried out in Sect. 4.6 .
4.1 Introduction
Many existing, industrial and research data sets containMVs in their attribute values.
Intuitively a MV is just a value for attribute that was not introduced or was lost in
the recording process. There are various reasons for their existence, such as manual
data entry procedures, equipment errors and incorrect measurements. The presence
of such imperfections usually requires a preprocessing stage in which the data is
prepared and cleaned [ 71 ], in order to be useful to and sufficiently clear for the
knowledge extraction process. The simplest way of dealing with MVs is to discard
the examples that contain them. However, this method is practical only when the
data contains a relatively small number of examples with MVs and when analysis of
the complete examples will not lead to serious bias during the inference [ 54 ].
MVs make performing data analysis difficult. The presence of MVs can also pose
serious problems for researchers. In fact, inappropriate handling of the MVs in the
analysis may introduce bias and can result in misleading conclusions being drawn
 
Search WWH ::




Custom Search