Database Reference
In-Depth Information
ratio segment, instead of storing the original data segment. The idea
here is that, in practice, the ratio segment is flat and therefore can be
significantly compressed as compared to the original data segment.
Furthermore, the objective of the GAMPS approach is to identify a
set of base segments, and associate every data segment with a base seg-
ment, such that the ratio segment can be used for reconstructing the
data segment within a L error bound. The problem of identification
of the base segments is posed as a facility location problem. Since this
problem is NP-hard, a polynomial-time approximation algorithm is used
for solving it, and producing the base segments and the assignment be-
tween the base segments and data segments.
Prior to GAMPS, Deligiannakis et al. [14] proposed the self-based
regression (SBR) algorithm that also finds a base-signal for compressing
historical sensor data based on spatial correlations among different data
streams. The base-signal for each segment captures the prominent fea-
tures of the other signals, and SBR finds piecewise correlations (based
on linear regression) to the base-signal. Lin et al. [42] proposed an algo-
rithm, referred to as adaptive linear vector quantization (ALVQ), which
improves SBR in two ways: (i) it increases the precision of compres-
sion, and (ii) it reduces the bandwidth consumption by compressing the
update of the base signal.
5.5 Multi-Model Data Compression
The potential burstiness of the data streams and the error introduced
by the sensors often result in limited effectiveness of a single model for
approximating a data stream within the prescribed error bound. Ac-
knowledging this, Lazaridis et al. [39] argue that a global approximation
model may not be the best approach and mention the potential need for
using multiple models. In [40], it is also recognized that different ap-
proximation models are more appropriate for data streams of different
statistical properties. The approach in [40] aims to find the best model
approximating the data stream based on the overall hit ratio (i.e., the
ratio of the number of data tuples fitting the model to the total number
of data tuples).
Papaioannou et al. [50] aim to effectively find the best combination
of different models for approximating various segments of the stream
regardless of the error norm. They argue that the selection of the most
ecient model depends on the characteristics of the data stream, namely
rate, burstiness, data range, etc., which cannot be always known apriori
for sensors and they can even be dynamic. Their approach dynamically
adapts to the properties of the data stream and approximates each data
Search WWH ::




Custom Search