Database Reference
In-Depth Information
its algorithm for one-dimensional data. Given the arrival of two data
tuples ( t 1 ,v 1 )and( t 2 ,v 2 ) of the first segment of the data stream, the
swing filter maintains a set of lines, bounded by an upper line u 1 and a
lower line l 1 . u 1 is defined by the pair of points ( t 1 ,v 1 )and( t 2 ,v 2 + ),
while l 1 is defined by the pair of points ( t 1 ,v 1 )and( t 2 ,v 2 ), where is
the maximum approximation error bound. Any line segment between u 1
and l 1 can represent the first two data tuples. When ( t 3 ,v 3 ) arrives, first
it is checked whether it falls within the lines l 1 , u 1 . Then, in order to
maintain the invariant that all lines within the set can represent all data
tuples so far, l 1 (respectively u 1 ) may have to be adjusted to the higher-
slope (respectively lower-slope) line defined by the pair of data tuples
(( t 1 ,v 1 ) , ( t 3 ,v 3
)) (respectively (( t 1 ,v 1 ) , ( t 3 ,v 3 + ))). Lines below this
new l 1 or above this new u 1 cannot represent the data tuple ( t 3 ,v 3 ). The
segment estimation continues until the new data tuple falls out of the
upper and lower lines for a segment. The generated line segment for the
completed filtering interval is chosen so as to minimize the mean square
error for the data tuples observed in that interval. As opposed to the
slide filter (described below), in the swing filter the new data segment
starts from the end point of the previous data segment.
In the slide filter, the operation is similar to the swing filter, but upper
and lower lines u and l are defined differently. Specifically, after ( t 1 ,v 1 )
and ( t 2 ,v 2 ) arrive, u 1 is defined by the pair of data tuples ( t 1 ,v 1
)and( t 2 ,v 2 + ), while l 1 is defined by ( t 1 ,v 1 + )and( t 2 ,v 2
).
After the arrival of ( t 3 ,v 3 ), l 1 (respectively u 1 ) may need to be adjusted
to the higher-slope (respectively lower-slope) line defined by (( t j ,v j +
) , ( t 3 ,v 3 )) (respectively (( t i ,v i ) , ( t 3 ,v 3 + ))), where i ∈ [1 , 2]. The
slide filter also includes a look-ahead of one segment, in order to produce
connected segments instead of disconnected segments, when possible.
Palpanas et al. [48] employ amnesic functions and propose novel tech-
niques that are applicable to a wide range of user-defined approximating
functions. According to amnesic functions, recent data is approximated
with higher accuracy, while higher error can be tolerated for older data.
Yi and Faloutsos [70] suggested approximating a data stream by dividing
it into equal-length segments and recording the mean value of the sen-
sor values that fall within the segment (referred to as segmented means
or as piecewise aggregate approximation (PAA)). On the other hand,
adaptive piecewise constant approximation (APCA) [6] allows segments
to have arbitrary lengths.
5.3.2 Piecewise Linear Approximation. The piecewise
linear approximation uses the linear regression model for compressing
Search WWH ::




Custom Search