CHANGE DETECTION IN CLASSIFICATION MODELS INDUCED FROM TIME SERIES DATA - Data Mining in Time Series Databases

Database Reference

In-Depth Information

Stage 2:

For period

K

, define the set of records

d K .

Count the number of records

n K =

|d K |

.

Calculate ˆ

e M K− 1 ,K

according to the validation method

V

.

Calculate the difference

ˆ

σ d ,

σ d ·

d

=

ABS

(ˆ

e M K− 1 ,K −

ˆ

e M K− 1 ,K− 1 )

,

ˆ

H 0 =

z (1 −α/ 2) ·

CD

α

Calculate and Return

(

).

Stage 3:

For every input and target variable existing in periods 1

,...,K

:

X p

Calculate:

x iK ,

n K

and

.

Calculate and Return

XP

(

α

).

n K ).

Also, it is very easy to store information about the distributions of target

and input variables in order to simplify the change detection methodology.

Based on the outputs of the change detection procedure, the user can

make a distinction between the eight possible variations of a change in

the data mining classification model (see sub-section above). Knowing the

causes of the change (if any), the user of this new methodology can act in

several ways, including reapplying the algorithm from scratch to the new

data, absorb the new period and update the model by using an incremental

algorithm, make

It is obvious that the complexity of this procedure is at most

O

(

K =

+ 1 and perform the change detection procedure

again for the next period, explore the type of the change and its magnitude

and effect on other characteristics of the

K

DM

model, and incorporate other

known methods dealing with the specific change(s) detected. One may also

apply multiple model approaches such as boosting, voting, bagging, etc.

The methodology is not restricted to databases with a constant number

of variables. The basic assumption is that if the addition of a new variable

will influence the relationship between the target variable and the input

variables in a manner that changes the validation accuracy, it will be iden-

tified as a significant change.

The procedure has three major stages. The first one is designed to per-

form initialization of the procedure. The second stage is aimed at detecting

a significant change in the “patterns” (rules) of the pre-built data-mining

model, as described in the previous section. The third stage is designated to

test whether the distribution of one or more variable(s) in the set of input

or target variable(s) has changed.

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home