Database Reference
In-Depth Information
The important factor that must not be set aside is the time factor. As more data
is accumulated into the problem domain, incrementally over time, one must
examine whether the new data agree with the previous data sets and make the
relevant assumptions about the future.
This work presented a novel change-detection method for detecting significant
changes in data for building data-mining models. We also addressed the data-
segmentation problem.
The major contributions of this research to the area of data mining and KDD
are:
1. This work defines three main causes for a statistically significant change in a
data-mining model:
z a change in the distribution of one or more of the candidate variables
attributes (A),
z a change in the distribution of the target variable (T), and
z a change in the “patterns” (rules) that define the relationship of the
candidate input to the target variable. That is, a change in the model M.
This work showed that although there are three main causes for significant
changes in the data-mining classification models, it is common that these main
causes will interact with each other, deriving eight possible combinations for a
significant change in an aggregated data-mining model. Moreover, the effect of
these causes is not the same for all databases and algorithms.
1. The change-detection method relies on the implementation of two statistical
estimators: change-detection hypothesis testing (CD) of every period K and
Pearson's estimator (XP) for testing matching proportions of variables.
2. The effect of a change is relevant and can be mainly detected at the period of
the change. If not detected, the influence of a change in a database will be
reflected in successive periods.
3. The change-detection method has a low computation cost. Its complexity is
O( n K ) due to only checking whether the new data agree with the prior
aggregated model.
4. By implementing the change detection in a data stream, we can detect
significant changes between succeeding segments and decide on a better
segmentation of a data stream based on a statistical analysis and ranking
schema.
5. Our change-detection methodology may be used as a basis for an automated
procedure aimed at finding the best segmentation of a given data stream by
integrating the change-detection methodology with a search algorithm.
The change detection procedure with the use of the statistical estimators can
detect significant changes in classification models of data mining. These changes
can be detected independently of the data-mining algorithm (e.g., C4.5 and ID3 by
Quinlan; KS2, ITI and DMTI by Utgoff [35-36]; IDTM by Kohavi [22]; Shen's
CDL4 [33]) or the DM classification model (rules, decision trees, networks, etc.),
which are used for constructing the corresponding model.
Search WWH ::




Custom Search