Database Reference
In-Depth Information
4. Segmentation of Continuous Data Streams
Based on a Change Detection Methodology
Gil Zeira 1 , Mark Last 2 , and Oded Maimon 3
1
Department of Industrial Engineering, Tel-Aviv University, Tel Aviv
69978, Israel; email: gil.zeira@ness.com
2
Department of Information Systems Engineering, Ben-Gurion University of
the Negev, Beer-Sheva 84105, Israel; email: mlast@bgu.ac.il
3
Department of Industrial Engineering, Tel-Aviv University, Tel Aviv 69978
Israel; email: maimon@eng.tau.ac.il
Most data mining algorithms assume that the historic data are the best estimator of
what will happen in the future. As more data are accumulated in a database, one
should examine whether the new data agrees with the model induced from
previous instances. The problem of recognizing the change of the underlying
model is known as a change detection problem. Once all change points have been
detected, a data stream can be represented as a series of nonoverlapping segments .
This work presents a new methodology for change detection and segmentation
based on a set of statistical estimators. While traditional segmentation methods are
aimed at analyzing univariate time series, our methodology detects statistically
significant changes in incrementally built classification models of data mining. In
our previous work, we have shown the methodology to be valid for change
detection in a set of artificial and benchmark data sets. In this work, we apply the
change detection procedure to real-world data sets from two distinct domains
(education and finance), where we detect significant changes between succeeding
segments and compare the quality of alternative segmentations.
4.1 Introduction
The problems of event detection are concerned with recognizing either the change
of parameter(s) in the model or the change of the model itself. The most common
representation of a univariate time series is piecewise linear approximation. A
straight line representing each segment can be found by linear interpolation or
linear regression.
Change detection in time-series regression models has always been a topic of
interest. For instance, Jones et al. [18] have developed a change-detection model
mechanism for serially correlated multivariate data. Yao [38] has estimated the
number of change points in time series using the BIC criterion. The bottom-up
segmentation algorithm of Keogh [20] starts with a large number of equal-size
segments and proceeds by merging two adjacent segments. Guralnik and
Srivastava [12] use likelihood criteria to perform recursive binary partitioning of
 
Search WWH ::




Custom Search