Databases Reference
In-Depth Information
The second, more important reason follows directly from Theorems 6 and 7
and is expressed in the following.
Theorem 8. The preprocessing generalization relation
prep is a weak
generalization relation and is not a data mining generalization relation
(Definition 10) i.e.
prep ∩≺ dm =
.
This theorem states that preprocessing operations are a weak general-
ization, disjoint with a strong, data mining generalization. We nevertheless
perform the preprocessing weak generalization them because it leads to the
strong generalization in the next, data mining proper stage. We need the
preprocessing stage to improve the quality, i.e. the granularity of the data
mining proper generalization. This is the reason why we routinely call the
preprocessing transformations a generalization.
The Theorem 8 also says that within the framework of our generalization
model we are able to distinguish (as we should have) the generalization that
occurs during the preprocessing stage of the data mining process from the
generalization of the data mining proper stage.
6 Data Preprocessing Model
It is natural that when building a model of the data mining process one has
to include data preprocessing methods and algorithms, i.e. one has to model
within it preprocessing stage as well as the data mining proper stage. In order
to achieve this task we introduced the notion of weak information generaliza-
tion relation as a component of our weak generalization model (Definition 22).
We have then introduced the preprocessing and the data mining generalization
relations (Definitions 10 and 25, respectively) and proved (Theorem 8) that
the preprocessing relation is a special case of the weak information general-
ization relation and it is disjoint with our data mining generalization relation.
Consequently we define here a semantic model of data preprocessing, as a
particular cases of our generalization model (Definition 1) as follows.
Definition 26. When we adopt the preprocessing generalization relation
prep (Definition 25) as the information generalization relation of the gen-
eralization model
G M (Definition 1) we call the model thus obtained a
Preprocessing Model and denote it PM , i.e.
prep ,
PM =( U,
K
G prep ,
prep )
where
K
prep is the set of preprocessing knowledge states (Definition 24),
G prep ⊆G
called a set of preprocessing generalization operators defined on
prep . We assume that
K
G prep ∩G dm =
, where
G dm is the set of data
mining operators (Definition 12).
Search WWH ::




Custom Search