CHANGE DETECTION IN CLASSIFICATION MODELS INDUCED FROM TIME SERIES DATA - Data Mining in Time Series Databases

Database Reference

In-Depth Information

The basic assumption for using this procedure is the use of sucient

statistics for a run of the algorithm in every period. As indicated above, if

this assumption is not valid, it is necessary to merge two or more periods

to maintain statistically significant outcomes.

3. Experimental Evaluation

3.1. Design of Experiments

In order to evaluate the change detection algorithm, a set of artificially

generated datasets were built based on the following characteristics:

•

Pre-determined definition and distribution of all variables (candidate

input and target).

•

Pre-determined set of rules.

•

Pure random generation of records.

•

Non-correlated datasets (between periods).

•

Minimal randomly generated noise.

•

No missing data.

In all generated datasets, we have introduced and tested a series of

artificially non-correlated changes of various types.

All datasets were mined with the IFN (Information-Fuzzy Network)

program (version 1.2 beta), based on the Information-Theoretic Fuzzy

Approach to Knowledge Discovery in Databases [Maimon and Last (2000)].

This novel method, developed by Mark Last and Oded Maimon was shown

to have better dimensionality reduction capability, interpretability, and sta-

bility than other data mining methods [e.g., see Last et al . (2002)] and was

therefore found suitable for this study.

This chapter uses two sets of experiments to evaluate the performance

of the change detection procedure:

•

The first set is aimed to estimate the hit rate (also called the “true

positive rate”) of the change detection methodology. Twenty four differ-

ent changes in two different databases were designed under the rules

mentioned above in order to confirm the expected outcomes of the

change detection procedure. Table 2 below summarizes the distribution

of the artificially generated changes in experiments on Database#1 and

Database#2.

•

All changes were tested independently under the minimum 5% confi-

dence level by the following set of hypothesis. All hypotheses were tested

separately with the purpose of evaluating the relationship of all tests.

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home