CHANGE DETECTION IN CLASSIFICATION MODELS INDUCED FROM TIME SERIES DATA - Data Mining in Time Series Databases

Database Reference

In-Depth Information

n K− 1 is the number of records in periods 1

,...,K −

1.

x iK

X

K

isthenumberofrecordsinthe i th class of variable

in the

period.

x iK− 1 isthenumberofrecordsinthe i th class of variable

X

in periods

1

,...,K −

1.

X p >χ 1 −α

is the number of classes of the tested

variable, then the null hypothesis that the variable X 's distribution has

been stationary in period

If

(

j −

1), where

j

like in the previous periods is rejected.

The explanation of Pearson's statistical hypothesis testing is provided

in [28].

K

2.4. Methodology

This section describes the algorithmic usage of the previous estimators:

Inputs:

• G

algorithm used for constructing the classification model

(e.g., C4.5 or IFN).

• M

is the

DM

is the classification model constructed by the

DM

algorithm (e.g., a

decision tree).

• V

is the validation method in use (e.g., 5-fold cross-validation).

• K

is the cumulative number of periods in a data stream.

• α

is the desired significance level for the change detection procedure (the

probability of a false alarm when no actual change is present).

Outputs:

• CD

(

α

) is the error-based change detection estimator (1 - p- value).

• XP

) is the Pearson's chi-square estimator of distribution change

(1 - p -value).

(

α

2.5. Change Detection Procedure

Stage 1:

For perio ds

K −

1 build the model

M K− 1 using the

DM

algorithm

G

.

Define the data set

D K− 1(val) .

Count the number of records

n K− 1 =

|D K− 1(val) |

.

Calculate the validation error rate ˆ

e M K− 1 ,K− 1

according to the valida-

tion method

V

.

Calculate

x iK− 1 ,

n K− 1

for every input and target variable existing in

periods 1

,...,K −

1.

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home