Database Reference
In-Depth Information
model is the best estimator of what will happen in the future. An impor-
tant factor that must not be set aside is the time factor. As more data is
accumulated in a time series database, incrementally over time, one must
examine whether the data in a new period agrees with the data in previous
periods and take the relevant decisions about learning in the future. This
work presents a novel change detection method for detecting significant
changes in classification models induced from continuously accumulated
data streams by batch learning methods.
The following statements can summarize the major contributions of this
work to the area of data mining and knowledge discovery in databases:
(i) This work defines three main causes for a statistically significant
change in a data-mining model:
A change in the probability distribution of one or more of candidate
input variables
A
.
A change in the distribution of the target variable
T
.
A change in the “patterns” (rules), which define the relationship
of the candidate input to the target variable. That is, a change in
the model
M
.
This work has shown that although there are three main causes for
significant changes in the data-mining models, it is common that these
main causes co-exist in the same data stream, deriving eight possible
combinations for a significant change in a classification model induced
from time series data. Moreover, these causes affect each other in a
manner and magnitude that depend on the database being mined and
the algorithm in use.
(ii) The change can be detected by the change detection procedure using a
three-stage validation technique. This technique is designed to detect
all possible significant changes.
(iii) The change detection method relies on the implementation of two
statistical tests:
(a) Change Detection hypothesis testing (
CD
) of every period
K
,
based on the definition of a significant change
C
in classification
1periods.
(b) Pearson's estimator ( XP ) for testing matching proportions of vari-
ables to detect a significant change in the probability distribution
of candidate input and target attributes.
“rules”, with respect to the previous
K −
Search WWH ::




Custom Search