SEGMENTING TIME SERIES: A SURVEY AND NOVEL APPROACH - Data Mining in Time Series Databases

Database Reference

In-Depth Information

Using the buffer allows us to gain a “semi-global” view of the data set for

Bottom-Up. However, it important to impose upper and lower bounds on

the size of the window. A buffer that is allowed to grow arbitrarily large will

revert our algorithm to pure Bottom-Up, but a small buffer will deteriorate

it to Sliding Windows, allowing excessive fragmentation may occur. In our

algorithm, we used an upper (and lower) bound of twice (and half) of the

initial buffer.

Our algorithm can be seen as operating on a continuum between the

two extremes of Sliding Windows and Bottom-Up. The surprising result

(demonstrated below) is that by allowing the buffer to contain just 5 or

6 times the data normally contained by is a single segment, the algorithm

produces essentially the same results as Bottom-Up, yet is able process

a never-ending stream of data. Our new algorithm requires only a small,

constant amount of memory, and the time complexity is a small constant

factor worse than that of the standard Bottom-Up algorithm.

4.2. Experimental Validation

We repeated the experiments in Section 3, this time comparing the new

algorithm with pure (batch) Bottom-Up and classic Sliding Windows. The

result, summarized in Figure 6, is that the new algorithm produces results

that are essentiality identical to Bottom-Up. The reader may be surprised

that SWAB can sometimes be slightly better than Bottom-Up. The reason

why this can occur is because SWAB is exploring a slight larger search

space. Every segment in Bottom-Up must have an even number of data-

points, since it was created by merging other segments that also had an even

number of segments. The only possible exception is the rightmost segment,

which can have an even number of segments if the original time series had

an odd length. Since this happens multiple times for SWAB, it is effectively

searching a slight larger search space.

5. Conclusions and Future Directions

We have seen the first extensive review and empirical comparison of time

series segmentation algorithms from a data mining perspective. We have

shown the most popular approach, Sliding Windows, generally produces

very poor results, and that while the second most popular approach, Top-

Down, can produce reasonable results, it does not scale well. In contrast,

the least well known, Bottom-Up approach produces excellent results and

scales linearly with the size of the dataset.

Search WWH ::

Custom Search

Home