Database Reference
In-Depth Information
As noted before, most existing methods have dealt with “how can the
model M be updated e ciently when a new period
K
is encountered?” or
“How can we adapt to the time factor?”, rather than asking the following
questions:
“Was the model significantly changed during the period
K
?”
“What was the nature of the change?”
“Should we consider several of the past periods as redundant or not
required in order for an algorithm
G
to generate a better model
M
?”
Hence, the objective of this work is: define and evaluate a change detec-
tion methodology for identifying a significant change that happened during
period
K
in a classification model, which was incrementally built over peri-
ods 1 to
K−
1, based on the data that was accumulated during the period
K
.
2.2. Variety of Changes
There are various significant changes, which might occur when inducing
the model
. There are several possible causes for
significant changes in the data mining model:
M
using the algorithm
G
(i) A change in the probability distribution of one or more of the input
attributes (
A
). For example, if a database in periods 1 to
K −
1, consists
of 45% males and 55% females, while in period
K
all records represent
males.
(ii) A change in the distribution of the target variable (
). For example, in
the case of examining the rate of failures in a final exam based on the
characteristics of the students in consecutive years. If in the year 1999
the average failure rate was 20% and in the year 2000 was 40%, then a
change in the target distribution has occurred.
T
A change in the “patterns” (rules), which define the relationship of the
input attributes to the target variable. That is a change in the model
M
,
derived from a change in a set of hypothesis in
. For instance, in the case
of examining the rate of failures in final exams based on the characteristics
of the students in the course of consecutive years, if in years 1999 male
students had 60% failures and female students had 5% failures, and in
year 2000 the situation was the opposite, then it is obvious that there was
a change in the patterns of behavior. This work defines this cause for a
significant change in a Data Mining model
H
M
by the following definition:
A change C is encountered in the period
K
if the validation error of the
Search WWH ::




Custom Search