Segmentation of Continuous Data Streams Based on a Change Detection Methodology - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

error rate of a given incremental model is accurately estimated by the previous K -1

periods. Therefore, a change in the rules (R) is encountered during period K if the

validation error of the model M K -1 (the model based on D K -1 ) on the database

D K -1 is significantly different from the validation error rate of the model M K -1

over d K .

Therefore, the parameter of interest for the statistical hypothesis testing is the

true validation error rate, and the null hypothesis for testing is as follows:

Val

(4.2)

Val

M K e is the validation error rate of D K -1 set of records on model M K -1

(the standard validation error of the model);

where

ˆ is the validation error rate of

the set of records d K on the aggregated model M K -1 ; and Val

M K

is the true

validation error rate of the incremental model, based on K -1 periods.

To detect a significant difference between two error rates (see also Mitchell

[29]), it is needed to use the Eq. (4.2). The objective of this test is to test the

difference between two independent proportions based on the approximation to the

normal distribution.

The hypothesis decision is measured by the following equations (two-sided

hypothesis):

)

val

)

(4.3)

dz D

then do not accept H .

M K e is the validation error rate of D K -1 set of records on model M K -1

(the standard observed validation error of the model); n K -1(val) = |D K -1(val) | is

the number of records selected for validation from periods 1, ... ,

where

is the observed validation error rate of the set of records d K on the aggregated

model M K -1 ; and n K = |d K | is the number of records in period K .

;

M K

4.2.3.2 Statistical Hypothesis for the Distribution Change Detection

The objective of the second estimator is to validate the assumption that a

variable(s)'s population (target or candidate) has significantly changed in a

statistical sense. For this purpose, we use Pearson's estimator for testing matching

Search WWH ::

Custom Search

Home