Database Reference
In-Depth Information
5.2 Methods for Data Segmentation
In [34], the piecewise linear approximation algorithms are categorized
in three groups: sliding window, top-down and bottom-up. The slid-
ing window approach expands the data segment as long as the data
tuples fit. The bottom-up approach first applies basic data segmenta-
tion employing the sliding window approach. Then, for two consecutive
segments, it calculates merging cost in terms of an approximation error.
Subsequently, it merges the segments with the minimum cost within the
maximum allowed approximation error, and updates the merging costs
of the updated segments. The process ends when no further merging
can be done without violating the maximum approximation error. The
top-down approach recursively splits the stream into two segments, so
as to obtain longest segments with the lowest error until all segments
are approximated within the maximum allowed error.
Among these three groups, only the sliding window approach can be
used online, but it employs look-ahead. The other two approaches per-
form better than the sliding window approach, but they need to scan
all data, hence they cannot be used for approximating streaming data.
Based on this observation, Keogh et al. [34] propose a new algorithm
that combines the online processing property of the sliding window ap-
proach and the performance of the bottom-up approach. This approach
needs a predefined buffer length. If the buffer is small, then it may
produce many small data segments; if the buffer is large, then there is
a delay in returning the approximated data segment. The maximum
look-ahead size is constrained by the maximum allowed delay between
data production and data reporting or data archiving.
5.3 Piecewise Approximation
Among several different data stream approximation techniques, piece-
wise linear approximation has been the most widely used [34, 39]. Piece-
wise linear approximation models the data stream with a separate linear
function per data segment. Piecewise constant approximation (PCA) ap-
proximates a data segment with a constant value, which can be the first
value of the segment (referred to as the cache filter) [47], the mean value
or the median value (referred to as poor man's compression - midrange
(PMC-MR) [39]).
In the cache filter, for all the sensor values in a segment g k ,thefol-
lowing condition should be satisfied:
v i k− 1 + p −
v i k− 1 +1 <
for p =1 ,...,i k ,
(2.13)
Search WWH ::




Custom Search