SEGMENTING TIME SERIES: A SURVEY AND NOVEL APPROACH - Data Mining in Time Series Databases

Database Reference

In-Depth Information

is still less than max error ) then the algorithm continues to increment

i

as in the classic algorithm. Otherwise they begin to decrement

until the

measured error is less than max error . This optimization can greatly speed

up the algorithm if the mean length of segments is large in relation to

the standard deviation of their length. The monotonically non-decreasing

property of residual error also allows binary search for the length of the

segment. Surprisingly, no one we are aware of has suggested this.

The Sliding Window algorithm can give pathologically poor results

under some circumstances, particularly if the time series in question con-

tains abrupt level changes. Most researchers have not reported this [Qu

et al. (1998), Wang and Wang (2000)], perhaps because they tested the

algorithm on stock market data, and its relative performance is best on

noisy data. Shatkay (1995), in contrast, does notice the problem and gives

elegant examples and explanations [Shatkay (1995)]. They consider three

variants of the basic algorithm, each designed to be robust to a certain

case, but they underline the diculty of producing a single variant of the

algorithm that is robust to arbitrary data sources.

Park et al. (2001) suggested modifying the algorithm to create “ mono-

tonically changing ” segments [Park et al. (2001)]. That is, all segments con-

sist of data points of the form of

i

t 1 ≥ t 2 ≥ ··· ≥ t n .

This modification worked well on the smooth synthetic dataset it was

demonstrated on. But on real world datasets with any amount of noise,

the approximation is greatly overfragmented.

Variations on the Sliding Window algorithm are particularly popular

with the medical community (where it is known as FAN or SAPA), since

patient monitoring is inherently an online task [Ishijima et al. (1983), Koski

et al. (1995), McKee et al. (1994), Vullings et al. (1997)].

t 1 ≤ t 2 ≤ ··· ≤ t n or

2.2. The Top-Down Algorithm

The Top-Down algorithm works by considering every possible partitioning

of the times series and splitting it at the best location. Both subsections

are then tested to see if their approximation error is below some user-

specified threshold. If not, the algorithm recursively continues to split the

subsequences until all the segments have approximation errors below the

threshold. The pseudocode for the algorithm is shown in Table 3.

Variations on the Top-Down algorithm (including the 2-dimensional

case) were independently introduced in several fields in the early 1970's.

In cartography, it is known as the Douglas-Peucker algorithm [Douglas and

Search WWH ::

Custom Search

Home