Database Reference
In-Depth Information
User Interface: The user interface plays two roles in the data cleaning
process. First, it takes all necessary inputs from users to perform data
cleaning, e.g., name of sensor data and parameter settings for models.
Second, the results of data cleaning, such as 'dirty' sensor values cap-
tured by the anomaly detector, are presented using graphs and tables,
so that users can confirm whether each candidate of such dirty values is
an actual error. The confirmed results are then stored to (or removed
from) the underlying data storage or materialized views.
Anomaly Detector: The anomaly detector is a core component in
sensor data cleaning. It uses models for detecting abnormal data values.
The anomaly detector works in online as well as oine mode. In the
online mode, whenever a new sensor value is delivered to the stream
processing engine, the dirtiness of this value is investigated and the er-
rors are filtered out instantly. In the oine mode, the data is cleaned
periodically, for instance, once per day. In the following subsections, we
will review popular models used for online anomaly detection.
Stream Processing Engine: The stream processing engine main-
tains streaming sensor data, while serving as a main platform where
the other system components can cooperatively perform data cleaning.
The anomaly detector is typically embedded into the stream processing
engine, it may also be implemented as a built-in function on database
systems.
Data Storage: The data storage maintains not only sensor values,
but also the corresponding cleaned data, typically in materialized views.
This is because applications on sensor networks often need to repeat-
edly perform data cleaning over the same data using different parameter
settings for the models, especially when the previous parameter settings
turn out to be inappropriate later. Therefore, it is important for the
system to store cleaned data in database views without changing the
original data, so that data cleaning can be performed again at any point
of time (or time interval) as necessary.
3.2 Models for Sensor Data Cleaning
This subsection reviews popular models that are widely used in the
sensor data cleaning process.
3.2.1 Regression Models. As sensor values are a representa-
tion of physical processes, it is naturally possible to uncover the follow-
ing properties: continuity of the sampling processes and correlations be-
tween different sampling processes. In principle, regression-based models
Search WWH ::




Custom Search