Database Reference
In-Depth Information
Table 5. A feature summary for the 3 major algorithms.
Algorithm
User can
Online
Complexity
specify 1
O ( n 2 K )
Top-Down
E, ME, K
No
Bottom-Up
E, ME, K
No
O ( Ln )
Sliding Window
E
Yes
O ( Ln )
1 KEY: E → Maximum error for a given segment, ME →
Maximum error for a given segment for entire time series,
K → Number of segments.
whether the algorithm is online or batch. Secondly, there is the question
of how the user can specify the quality of desired approximation. With
trivial modifications the Bottom-Up algorithm allows the user to specify
the desired value of
, the maximum error per segment, or total error
of the approximation. A (non-recursive) implementation of Top-Down can
also be made to support all three options. However Sliding Window only
allows the maximum error per segment to be specified.
K
3. Empirical Comparison of the Major
Segmentation Algorithms
In this section, we will provide an extensive empirical comparison of the
three major algorithms. It is possible to create artificial datasets that allow
one of the algorithms to achieve zero error (by any measure), but forces
the other two approaches to produce arbitrarily poor approximations. In
contrast, testing on purely random data forces the all algorithms to pro-
duce essentially the same results. To overcome the potential for biased
results, we tested the algorithms on a very diverse collection of datasets.
These datasets where chosen to represent the extremes along the fol-
lowing dimensions, stationary/non-stationary, noisy/smooth, cyclical/non-
cyclical, symmetric/asymmetric, etc. In addition, the data sets represent
the diverse areas in which data miners apply their algorithms, includ-
ing finance, medicine, manufacturing and science. Figure 3 illustrates the
10 datasets used in the experiments.
3.1. Experimental Methodology
For simplicity and brevity, we only include the linear regression versions
of the algorithms in our study. Since linear regression minimizes the sum
of squares error, it also minimizes the Euclidean distance (the Euclidean
Search WWH ::




Custom Search