Database Reference
In-Depth Information
5.1 Overview of Sensor Data Compression
System
The goal of the sensor data compression system is to approximate a
sensor data stream by a set of functions. Data compression methods
that we are going to study in this section permit the occurrence of ap-
proximation errors. These errors are characterized by a specific error
norm. Furthermore, a standard approach to sensor data compression is
to segment the data stream into
data segments
, and then approximate
each data segment, so that a specific error norm is satisfied. For exam-
ple, if we are considering the
L
∞
norm, then each sensor value of the
data stream is approximated within an error bound
.
Let us assume that we have
K
segments of a data stream. We denote
these segments as
g
1
,g
2
,...,g
K
,where
g
1
approximates the data tu-
ples ((
t
1
,v
1
)
,...,
(
t
i
1
,v
i
1
)), while
g
k
,where
k
=2
,...,K
, approximates
the data items ((
t
i
k−
1
+1
,v
i
k−
1
+1
)
,
(
t
i
k−
1
+2
,v
i
k−
1
+1
)
,...,
(
t
i
k
,v
i
k
)). Simi-
lar to [20], we distinguish between two classes of the segments used for
approximation, namely
connected segments
and
disconnected segments
.
In connected segments, the ending point of the previous segment is the
starting point of the new segment. On the contrary, in disconnected
segments, the approximation of the new segment starts from the sub-
sequent data item in the stream. Disconnected segments offer more
approximation flexibility and may lead to fewer segments; however, for
linear approximation [35], they necessitate the storage of two data tu-
ples (i.e., start tuple and end tuple) per data segment, as opposed to
connected segments.
Since functions are employed for approximating data segments, only
the approximated data segments are stored in the database, instead
of the raw sensor values of the data stream [64, 50]. A schema for
linear segments is presented in [64], consisting of a table, referred to
as
FunctionTable
, where each row represents a linear model with at-
tributes
start time
,
end time
,
slope
and
intercept
(i.e., base) of the
segment. In case of connected segments [20], the
end time
attribute can
be omitted.
A more generic schema for storing data streams, approximated by
multiple models, was proposed in [50]. It consists of one table, referred
to as the (
SegmentTable
) for storing data segments, and a second table
segment in the time interval
[start time
,
end time]
. The attribute
id
stands for identification of the model that is used in the segment.
The primary key in the
SegmentTable
is the
start time
, while in the