Database Reference
In-Depth Information
Conventional sensor data is noisy because sensor readings are often
created by converting other measured quantities (such as voltage) into
measured quantities such as the temperature. This process can be very
noisy, since the conversion process is not precise. Furthermore, system-
atic errors are also introduced, because of changes in external conditions
or ageing of the sensor. In order to reduce such errors, it is possible to
either re-calibrate the sensor [25], or perform data-driven cleaning and
uncertainty modeling [34]. Furthermore, the data may sometimes be
incomplete because of periodic failure of some of the sensors. A detailed
discussion of methods for cleaning conventional sensor data is provided
in Chapter 2 of this topic.
RFID data is even noisier than conventional sensor data, because of
the inherent errors associated with the reader-tag communication pro-
cess. Furthermore, since RFID data is repeatedly scanned by the reader,
even when the data is stationary, it is massively redundant . Techniques
for cleaning RFID data are discussed in [9]. Therefore, we will provide a
brief discussion of these issues and refer the readers to the other chapters
for more details. In the context of many different kinds of sources such
as conventional sensor data, RFID data, and privacy-preserving data
mining, uncertain probabilistic modeling seems to be a solution, which
is preferred in a variety of different contexts [6, 34, 66], because of recent
advances in the field of probabilistic databases [7]. The broad idea is that
when the data can be represented in probabilistic format (which reflects
its errors and uncertainty), it can be used more effectively for mining
purposes. Nevertheless, probabilistic databases are still an emerging
field, and, as far as we are aware, all commercial solutions work with
conventional (deterministic) representations of the sensor data. There-
fore, more direct solutions are required in order to clean the data as
deterministic entities.
In order to address the issue of lost readings in RFID data, many
data cleaning systems [47, 120] is to use a temporal smoothing filter,
in which a sliding window over the reader's data stream interpolates
for lost readings from each tag within the time window. The idea is to
provide each tag more opportunities to be read within the smoothing
window. Since the window size is a critical parameter, the work in
[55] proposes SMURF (Statistical sMoothing for Unreliable RFid data) ,
which is an adaptive smoothing filter for raw RFID data streams. This
technique determines the most effective window size automatically, and
continuously changes it over the course of the RFID stream. Many of
these cleaning methods use declarative methods in the cleaning process
are discussed in [54, 56, 55]. The broad idea is to specify cleaning stages
with the use of high-level declarative queries over relational data streams.
Search WWH ::




Custom Search