Database Reference
In-Depth Information
Table 4.1. Definition of the variety of changes in a data-mining model.
“Rules”
A
T
Details
-
-
-
No change.
-
-
+
A change in the target variable.
-
+
-
A change in the attribute variable(s).
-
+
+
A change in the target and in the attribute
variable(s).
+
-
-
A change in “patterns” (rules) of the data-
mining model.
+
-
+
A change in “patterns” (rules) of the data-
mining model and a change in the target
variable.
+
+
-
A change in “patterns” (rules) of the data-
mining model and a change in the attribute
variable(s).
+
+
+
A change in “patterns” (rules) of the data-
mining model and a change in the target and
the attribute(s) variable.
The definition of the variety of possible changes in a data-mining model is a
new concept. As noted, several researchers tended to deal with concept change
(e.g., target), population change (e.g., candidate effecting target), activity
monitoring (e.g., model), etc. The new notion that all three major causes interact
and affect each other is tested and validated in this work.
4.2.3 Statistical Hypothesis Testing
To determine whether a significant change has occurred during period K , a set of
statistical estimators is presented in this chapter. The use of these estimators is
subject to several conditions: (a) Every period contains a sufficient amount of data
to rebuild a model for that specific period. The decision of whether a period
contains sufficient data (records) should be based on the relationship between the
training and validation error rate of every period and is subjective for different
users, for acceptable range in difference, overfitting, and so on. (b) The same DM
algorithm is used in all periods (i.e., the data-mining model in every period K was
constructed based on the same DM algorithm). (c) The same validation method is
used in all periods (e.g., one of the following: five-fold, 10-fold, 1/3 of the set of
records).
4.2.3.1 Statistical Hypothesis for the Model Change Detection
The first estimator for the change-detection methodology is designed to detect a
change in the “patterns” (rules) that defines the relationship of the candidate input
to the target variable, that is, a change in the model M, derived from a change in a
set of hypothesis in H. Because we assume that huge amounts of data are involved
in building the incremental model, it is simple to assume that the true validation
 
Search WWH ::




Custom Search