Database Reference
In-Depth Information
the time segment. A sliding window bottom-up (SWAB) segmentation algorithm
based on piecewise linear representation is presented in [21]. All these methods
produce segmentation of a time series based on changes in a single parameter.
One known approach to deal with changes in classification models is using
incremental or semi-incremental learning methods, which desire to achieve the
following goals: (a) Decrease computational complexity as a function of time; (b)
use prior knowledge to determine future conclusions; (c) update or completely re-
train the model to improve its accuracy; and (d) reduce model complexity, for
instance, network size, decision tree size, and depth or a set of extracted rules.
These methods include: incremental concept learning with bounded example
memory (Case et al. [2]), Utgoff 's [35], [36] method for incremental induction of
decision trees (ITI), Shen's [33] semi-incremental learning method (CDL4),
Cheung's [5] technique for updating association rules in large databases,
Gerevini's [11] network constraints updating technique, Zhang's [40] method for
feed-forwarding neural networks (SELF), incrementally trained connectionist
networks (Martinez [27]), and a simple backpropagation algorithm for neural
networks (Mangasarian and Solodov [28]).
The main topic in most incremental learning methods has been how the model
(this could be a set of rules, a decision tree, neural networks, and so on) is refined
or reconstructed efficiently as new amounts of data are encountered. This problem
has been challenged by many of the preceding algorithms. However, the real
question is: W hen should we discard the current model and reconstruct a new one,
because something in our notion of the model has significantly changed? Hence,
the problem is not how to reconstruct better, but alternatively, how to detect a
change in a model based on accumulated data.
Learning in the presence of change is not a new concept in the data mining
research area. Some researchers have studied various aspects of mining massive
nonstationary data streams, including:
z defining robustness and discovering robust knowledge from databases (Hsu
and Knoblock [16]) or learning stable concepts in domains with hidden
changes in concepts (Harries and Horn [13]);
z identifying and modeling a persistent drift in a database (Freund and
Mansour [10]);
z adapting to concept and population drift (Helmbold and Long [14], Hulten et
al. [17], Kelly et al. [19], Lane and Brodley [23], Widmer and Kubat [37]);
z activity monitoring (Fawcett and Provost [8]); and
z improving accuracy, algorithm run time and noise reduction by Partitioning,
Arbitering or Combining Models methods (Ali and Pazzani [1], Chan and
Stolfo [3-4], Domingos [6]).
These methods do not challenge directly the problem of detecting significant
changes. Rather they deal with environment, which changes over time.
This chapter introduces a novel methodology for detecting a significant change
in a classification model of data mining by implementing a set of statistical
estimators. Detection of a change implies that the model induced from a
sufficiently large data set is no longer valid for use such as prediction or rule
induction and alternatively, a new model, representing a subsequent time segment,
must be constructed.
Search WWH ::




Custom Search