Database Reference
In-Depth Information
Adapting to Concept and Population Drift [14,18,19,21,34].
Activity Monitoring [9].
Rather than challenging the problem of detecting significant changes, the
above methods deal directly with data mining in changing environment.
This chapter introduces a novel methodology for detecting a significant
change in a classification model of data mining, by identifying distinct cat-
egories of changes and implementing a set of statistical estimators. The
major contribution of our change detection procedure (as will be described
in later sections), is the ability to make a confident claim that the model
that was pre-built based on a suciently large dataset is no longer valid for
any future use such as prediction, rule induction, etc., and consequently, a
new model must be constructed.
The rest of the chapter is organized as follows. In Section 2, we present
our change detection procedure in data mining models, by defining the data
mining classification model characteristics (2.1), describing the variety of
possible significant changes (2.2), definition of the hypothesis testing (2.3),
and the methodology for the change detection procedure (2.4). In Section 3,
we describe an experimental evaluation of the change detection method-
ology by introducing artificial changes in databases and implementing the
change detection methodology to identify these changes. Section 4 describes
evaluation of the change detection methodology on real-world datasets.
Section 5 presents validation of the work's assumptions and Section 6 con-
cludes this chapter by summing-up the main contributions of our method,
and presenting several options for future research in implementation and
extension of our methodology.
2. Change Detection in Classification Models
of Data Mining
2.1. Classification Model Characteristics
Classification is the task which involves constructing a model predicting the
(usually categorical) label of a classifying attribute and using it to classify
new data. The induced classification model can be represented as a set of
rules (see the RISE system [8], the BMA method for rule induction [6],
PARULEL and PARADISER etc.), a decision tree (see [34-36] ), neural
networks (see [35]), information-theoretic connectionists networks (see IFN
[25]) and so on.
Given a database
D
which consist of
X
set of records. The data mining
model
M
is one that by using an algorithm (
G
) generates a set of hypothesis
Search WWH ::




Custom Search