THE INTERNET OF THINGS: A SURVEY FROM THE DATA-CENTRIC PERSPECTIVE - Managing and Mining Sensor Data

Database Reference

In-Depth Information

Conventional sensor data is noisy because sensor readings are often

created by converting other measured quantities (such as voltage) into

measured quantities such as the temperature. This process can be very

noisy, since the conversion process is not precise. Furthermore, system-

atic errors are also introduced, because of changes in external conditions

or ageing of the sensor. In order to reduce such errors, it is possible to

either re-calibrate the sensor [25], or perform data-driven cleaning and

uncertainty modeling [34]. Furthermore, the data may sometimes be

incomplete because of periodic failure of some of the sensors. A detailed

discussion of methods for cleaning conventional sensor data is provided

in Chapter 2 of this topic.

RFID data is even noisier than conventional sensor data, because of

the inherent errors associated with the reader-tag communication pro-

cess. Furthermore, since RFID data is repeatedly scanned by the reader,

even when the data is stationary, it is massively redundant . Techniques

for cleaning RFID data are discussed in [9]. Therefore, we will provide a

brief discussion of these issues and refer the readers to the other chapters

for more details. In the context of many different kinds of sources such

as conventional sensor data, RFID data, and privacy-preserving data

mining, uncertain probabilistic modeling seems to be a solution, which

is preferred in a variety of different contexts [6, 34, 66], because of recent

advances in the field of probabilistic databases [7]. The broad idea is that

when the data can be represented in probabilistic format (which reflects

its errors and uncertainty), it can be used more effectively for mining

purposes. Nevertheless, probabilistic databases are still an emerging

field, and, as far as we are aware, all commercial solutions work with

conventional (deterministic) representations of the sensor data. There-

fore, more direct solutions are required in order to clean the data as

deterministic entities.

In order to address the issue of lost readings in RFID data, many

data cleaning systems [47, 120] is to use a temporal smoothing filter,

in which a sliding window over the reader's data stream interpolates

for lost readings from each tag within the time window. The idea is to

provide each tag more opportunities to be read within the smoothing

window. Since the window size is a critical parameter, the work in

[55] proposes SMURF (Statistical sMoothing for Unreliable RFid data) ,

which is an adaptive smoothing filter for raw RFID data streams. This

technique determines the most effective window size automatically, and

continuously changes it over the course of the RFID stream. Many of

these cleaning methods use declarative methods in the cleaning process

are discussed in [54, 56, 55]. The broad idea is to specify cleaning stages

with the use of high-level declarative queries over relational data streams.

Search WWH ::

Custom Search

Home