Database Reference
In-Depth Information
Stage 2:
For period
K
, define the set of records
d K .
Count the number of records
n K =
|d K |
.
Calculate ˆ
e M K− 1 ,K
according to the validation method
V
.
Calculate the difference
ˆ
σ d ,
σ d ·
d
=
ABS
e M K− 1 ,K
ˆ
e M K− 1 ,K− 1 )
,
ˆ
H 0 =
z (1 −α/ 2) ·
CD
α
Calculate and Return
(
).
Stage 3:
For every input and target variable existing in periods 1
,...,K
:
X p
Calculate:
x iK ,
n K
and
.
Calculate and Return
XP
(
α
).
n K ).
Also, it is very easy to store information about the distributions of target
and input variables in order to simplify the change detection methodology.
Based on the outputs of the change detection procedure, the user can
make a distinction between the eight possible variations of a change in
the data mining classification model (see sub-section above). Knowing the
causes of the change (if any), the user of this new methodology can act in
several ways, including reapplying the algorithm from scratch to the new
data, absorb the new period and update the model by using an incremental
algorithm, make
It is obvious that the complexity of this procedure is at most
O
(
K =
+ 1 and perform the change detection procedure
again for the next period, explore the type of the change and its magnitude
and effect on other characteristics of the
K
DM
model, and incorporate other
known methods dealing with the specific change(s) detected. One may also
apply multiple model approaches such as boosting, voting, bagging, etc.
The methodology is not restricted to databases with a constant number
of variables. The basic assumption is that if the addition of a new variable
will influence the relationship between the target variable and the input
variables in a manner that changes the validation accuracy, it will be iden-
tified as a significant change.
The procedure has three major stages. The first one is designed to per-
form initialization of the procedure. The second stage is aimed at detecting
a significant change in the “patterns” (rules) of the pre-built data-mining
model, as described in the previous section. The third stage is designated to
test whether the distribution of one or more variable(s) in the set of input
or target variable(s) has changed.
Search WWH ::




Custom Search