CHANGE DETECTION IN CLASSIFICATION MODELS INDUCED FROM TIME SERIES DATA - Data Mining in Time Series Databases

Database Reference

In-Depth Information

tests

within the available hypothesis space, which generalize the relation-

ship between a set of candidate input variables and the target variable. The

following notation is a general description of the data mining classification

modeling task: Given a database

, containing a complete set of records

is a vector of candidate input variables (attributes

from the examined phenomenon, which might have some influence on the

target concept), and

A|T

), where

is a target variable (i.e. the target concept). Find

the best set of hypothesis

within the available hypothesis space, which

generalize the relationship between a set of candidate input variables and

the target variable (e.g. the model

) using some Data Mining algorithm.

Each record is regarded as a complete set of conjunction between attributes

and a target concept (or variable), such as

X i =(

A 1 i =

a 1 j (1)

,A 2 i =

a 2 j (2)

,...,A ni =

a nj (

)

,T i =

t j )

A 1 ,A 2 ...A n ) is a known set of attributes of the desired phenomenon

and

) is a known discrete or continuous

target variable or concept. The goal for the classification task is to generate

best set of hypotheses to describe the model

(also noted in

literature as

using an algorithm

.To

simplify, this means, generating the following general equation:

M G

A ¬

(1)

It is obvious that not all attributes are proved to be statistically signif-

icant in order to be included in the set of hypothesizes.

In most algorithms, the database

is divided into two parts — learning

D val ) sets. The first is supposed to hold enough

information for assembling a statistically significant and stable model based

on the

D learn ) and validation (

(

algorithm. The second part is supposed to ensure that the

algorithm performs its goals by validation of the built models on unseen

records. Evaluating the prediction accuracy of any model

, which is built

by a classification algorithm

,iscommonlyperformed,byestimatingthe

Validation Error Rate of the examined model.

When the database

is not fixed but, alternatively, accumulated over

time, the classification task should be altered: in every period

,anewset

of records

X K

is accumulated to the database.

d K

will be the notation for

the set of records

X K that was added in the start of period

,and

D K will

D k = d k . Therefore, given

be the notation for the accumulated database

a database

D K , containing a complete set of records

X K , generate the best

set of hypothesis

H K

to describe the accumulated model

M K ,andanew

question is encountered: is

M K,G =

M K +1 ,G ,inevery

,...,k

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home