Databases Reference
In-Depth Information
Three Approaches to Missing Attribute Values:
A Rough Set Perspective
Jerzy W. Grzymala-Busse
Department of Electrical Engineering and Computer Science, University of Kansas,
Lawrence, KS 66045, USA
and
Institute of Computer Science, Polish Academy of Sciences, 01-237 Warsaw, Poland
Jerzy@ku.edu
http://lightning.eecs.ku.edu/index.html
Summary. A new approach to missing attribute values, based on the idea of an
attribute-concept value, is studied in the paper. This approach, together with two
other approaches to missing attribute values, based on “do not care” conditions
and lost values are discussed using rough set methodology, including attribute-value
pair blocks, characteristic sets, and characteristic relations. Characteristic sets are
generalization of elementary sets while characteristic relations are generalization
of the indiscernibility relation. Additionally, three definitions of lower and upper
approximations are discussed and used for induction of certain and possible rules.
1 Introduction
In this chapter data sets are presented in the form of decision tables, where
columns are labeled by variables and rows by case (or example) names. Vari-
ables are categorized into independent variables, also called attributes, and
dependent variables, also called decisions. Usually decision tables have only
one decision. The set of all cases that correspond to the same decision value
is called a concept (or a class).
In most papers on rough set theory it is assumed that values, for all vari-
ables and all cases, are specified. For such tables the indiscernibility relation,
one of the most fundamental ideas of rough set theory, describes cases that
can be distinguished from each other.
However, in many real-life applications, data sets have missing attribute
values, or, in different words, the corresponding decision tables are incom-
pletely specified. For simplicity, incompletely specified decision tables will be
called incomplete decision tables.
In data mining two main strategies are used to deal with missing attribute
values. The former strategy is based on conversion of incomplete data sets
(i.e., data sets with missing attribute values) into complete data sets and then
J.W. Grzymala-Busse: Three Approaches to Missing Attribute Values: A Rough Set Perspec-
Search WWH ::




Custom Search