CHANGE DETECTION IN CLASSIFICATION MODELS INDUCED FROM TIME SERIES DATA - Data Mining in Time Series Databases

Database Reference

In-Depth Information

1. Introduction

As mass of data is incrementally accumulated into large databases over

time, we tend to believe that the new data “acts” somehow resembling to

the prior knowledge we have on the operation or facts that it describes.

Change detection in time series is not a new subject and it has always been

a topic of continued interest. For instance, Jones et al . [17] have developed

a change detection model mechanism for serially correlated multivariate

data. Yao [39] has estimated the number of change points in time series

using the BIC criterion. However the change detection in classification is

still not well elaborated.

There are many algorithms and methods that deal with the incremental

learning problem, which is concerned with updating an induced model upon

receiving new data. These methods are specific to the underlying data min-

ing model. For example: Utgoff's method for incremental induction of deci-

sion trees (ITI) [35,36], Wei-Min Shen's semi-incremental learning method

(CDL4) [34], David W. Cheung technique for updating association rules in

large databases [5], Alfonso Gerevini's network constraints updating tech-

nique [12], Byoung-Tak Zhang's method for feedforwarding neural networks

(SELF) [40], simple Backpropagation algorithm for neural networks [27],

Liu and Setiono's incremental feature selection (LVI) [24] and more.

The main topic in most incremental learning theories is how the model

(this could be a set of rules, a decision tree, neural networks, and so on) is

refined or reconstructed eciently as new amounts of data is encountered.

This problem has been challenged by many of the algorithms mentioned

above, and many of them performed significantly better than running the

algorithm from scratch, generally when the records were received on-line

and changes had a low magnitude. An important question that one must

examine whenever a new mass of data is accumulated is “Is it really wise to

keep on re-constructing or verifying the current model, when everything or

something in our notion of the model could have significantly changed?” In

other words, the main problem is not how to reconstruct better, but rather

how to detect a change in a model based on a time series database.

Some researchers have proposed various representations of the problem

of large time changing databases and populations including:

•

Defining robustness and discovering robust knowledge from data-

bases [15] or learning stable concepts [13] in domains with hidden changes

in concepts.

•

Identifying and modeling a persistent drift in a database [11].

Search WWH ::

Custom Search

Home