Database Reference
In-Depth Information
Figure 2.6.
Detected anomalies based on 2-degree Chebyshev regression.
3.2.2 Probabilistic Models.
In sensor data cleaning, infer-
ring sensor values is perhaps the most important task, since systems can
then detect and clean dirty sensor values by comparing raw sensor val-
ues with the corresponding inferred sensor values.
Figure 2.7
shows an
example of the data cleaning process using probabilistic models. At time
t
i
= 6, the probabilistic model infers a probability distribution using the
previous values
v
2
j
,...,v
5
j
in the sliding window. The expected value
v
6
j
(e.g., the mean of the Gaussian distribution in the future) is then
considered as the inferred sensor value for sensor
s
j
.
Next, the anomaly detector checks whether the raw sensor value
v
6
j
resides within a reasonably accurate area. This is done in order to check
whether the value is
normal
. For instance, the 3
σ
range can cover 99.7
% of the density in the figure, where
v
6
j
is supposed to appear. Thus,
the data cleaning process can consider that
v
6
j
is not an error. At
t
i
=7,
the window slides and now contains raw sensor values
v
3
j
,...,v
6
j
.By
repeating the same process, the anomaly detector finds
v
7
j
resides out
of the error bound (3
σ
range) in the inferred probability distribution,
and is identified as an anomaly [57].
A vast body of research work has utilized probabilistic models for
computing inferred values. The
Kalman filter
is perhaps one of the most