Database Reference
In-Depth Information
Table 3. The generic Top-Down algorithm.
Algorithm
Algorithm Seg TS = Top Down(T, max error)
best so far = inf;
for
for
for i = 2 ttto length(T) - 2 // Find the best splitting point.
improvement in approximation = improvement splitting here(T, i);
iiif improvement in approximation < best so far
breakpoint = i;
best so far = improvement in approximation;
end;
end;
end;
end;
end;
// Recursively split the left segment if necessary.
iiif calculate error(T[1:breakpoint]) > max error
Seg TS = Top Down(T[1:breakpoint]);
end;
end;
end;
// Recursively split the right segment if necessary.
iiif calculate error(T[breakpoint + 1:length(T)]) > max error
Seg TS = Top Down(T[breakpoint + 1:length(T)]);
end;
end;
end;
end;
Peucker (1973)]; in image processing, it is known as Ramer's algorithm
[Ramer (1972)]. Most researchers in the machine learning/data mining com-
munity are introduced to the algorithm in the classic textbook by Duda and
Harts, which calls it “ Iterative End-Points Fits ” [Duda and Hart (1973)].
In the data mining community, the algorithm has been used by [Li et al.
(1998)] to support a framework for mining sequence databases at multiple
abstraction levels. Shatkay and Zdonik use it (after considering alternatives
such as Sliding Windows) to support approximate queries in time series
databases [Shatkay and Zdonik (1996)].
Park et al. introduced a modification where they first perform a scan
over the entire dataset marking every peak and valley [Park et al. (1999)].
These extreme points used to create an initial segmentation, and the Top-
Down algorithm is applied to each of the segments (in case the error on an
individual segment was still too high). They then use the segmentation to
support a special case of dynamic time warping. This modification worked
well on the smooth synthetic dataset it was demonstrated on. But on real
world data sets with any amount of noise, the approximation is greatly
overfragmented.
Lavrenko et al. uses the Top-Down algorithm to support the concurrent
mining of text and time series [Lavrenko et al. (2000)]. They attempt to
discover the influence of news stories on financial markets. Their algorithm
contains some interesting modifications including a novel stopping criteria
basedonthet-test.
Search WWH ::




Custom Search