SEGMENTING TIME SERIES: A SURVEY AND NOVEL APPROACH - Data Mining in Time Series Databases

Database Reference

In-Depth Information

Table 3. The generic Top-Down algorithm.

Algorithm

Algorithm Seg TS = Top Down(T, max error)

best so far = inf;

for

for i = 2 ttto length(T) - 2 // Find the best splitting point.

improvement in approximation = improvement splitting here(T, i);

iiif improvement in approximation < best so far

breakpoint = i;

best so far = improvement in approximation;

end;

// Recursively split the left segment if necessary.

iiif calculate error(T[1:breakpoint]) > max error

Seg TS = Top Down(T[1:breakpoint]);

end;

// Recursively split the right segment if necessary.

iiif calculate error(T[breakpoint + 1:length(T)]) > max error

Seg TS = Top Down(T[breakpoint + 1:length(T)]);

end;

Peucker (1973)]; in image processing, it is known as Ramer's algorithm

[Ramer (1972)]. Most researchers in the machine learning/data mining com-

munity are introduced to the algorithm in the classic textbook by Duda and

Harts, which calls it “ Iterative End-Points Fits ” [Duda and Hart (1973)].

In the data mining community, the algorithm has been used by [Li et al.

(1998)] to support a framework for mining sequence databases at multiple

abstraction levels. Shatkay and Zdonik use it (after considering alternatives

such as Sliding Windows) to support approximate queries in time series

databases [Shatkay and Zdonik (1996)].

Park et al. introduced a modification where they first perform a scan

over the entire dataset marking every peak and valley [Park et al. (1999)].

These extreme points used to create an initial segmentation, and the Top-

Down algorithm is applied to each of the segments (in case the error on an

individual segment was still too high). They then use the segmentation to

support a special case of dynamic time warping. This modification worked

well on the smooth synthetic dataset it was demonstrated on. But on real

world data sets with any amount of noise, the approximation is greatly

overfragmented.

Lavrenko et al. uses the Top-Down algorithm to support the concurrent

mining of text and time series [Lavrenko et al. (2000)]. They attempt to

discover the influence of news stories on financial markets. Their algorithm

contains some interesting modifications including a novel stopping criteria

basedonthet-test.

Search WWH ::

Custom Search

Home