Database Reference
In-Depth Information
Using the buffer allows us to gain a “semi-global” view of the data set for
Bottom-Up. However, it important to impose upper and lower bounds on
the size of the window. A buffer that is allowed to grow arbitrarily large will
revert our algorithm to pure Bottom-Up, but a small buffer will deteriorate
it to Sliding Windows, allowing excessive fragmentation may occur. In our
algorithm, we used an upper (and lower) bound of twice (and half) of the
initial buffer.
Our algorithm can be seen as operating on a continuum between the
two extremes of Sliding Windows and Bottom-Up. The surprising result
(demonstrated below) is that by allowing the buffer to contain just 5 or
6 times the data normally contained by is a single segment, the algorithm
produces essentially the same results as Bottom-Up, yet is able process
a never-ending stream of data. Our new algorithm requires only a small,
constant amount of memory, and the time complexity is a small constant
factor worse than that of the standard Bottom-Up algorithm.
4.2. Experimental Validation
We repeated the experiments in Section 3, this time comparing the new
algorithm with pure (batch) Bottom-Up and classic Sliding Windows. The
result, summarized in Figure 6, is that the new algorithm produces results
that are essentiality identical to Bottom-Up. The reader may be surprised
that SWAB can sometimes be slightly better than Bottom-Up. The reason
why this can occur is because SWAB is exploring a slight larger search
space. Every segment in Bottom-Up must have an even number of data-
points, since it was created by merging other segments that also had an even
number of segments. The only possible exception is the rightmost segment,
which can have an even number of segments if the original time series had
an odd length. Since this happens multiple times for SWAB, it is effectively
searching a slight larger search space.
5. Conclusions and Future Directions
We have seen the first extensive review and empirical comparison of time
series segmentation algorithms from a data mining perspective. We have
shown the most popular approach, Sliding Windows, generally produces
very poor results, and that while the second most popular approach, Top-
Down, can produce reasonable results, it does not scale well. In contrast,
the least well known, Bottom-Up approach produces excellent results and
scales linearly with the size of the dataset.
Search WWH ::




Custom Search