Database Reference
In-Depth Information
Figure 2.6. Detected anomalies based on 2-degree Chebyshev regression.
3.2.2 Probabilistic Models. In sensor data cleaning, infer-
ring sensor values is perhaps the most important task, since systems can
then detect and clean dirty sensor values by comparing raw sensor val-
ues with the corresponding inferred sensor values. Figure 2.7 shows an
example of the data cleaning process using probabilistic models. At time
t i = 6, the probabilistic model infers a probability distribution using the
previous values v 2 j ,...,v 5 j in the sliding window. The expected value
v 6 j (e.g., the mean of the Gaussian distribution in the future) is then
considered as the inferred sensor value for sensor s j .
Next, the anomaly detector checks whether the raw sensor value v 6 j
resides within a reasonably accurate area. This is done in order to check
whether the value is normal . For instance, the 3 σ range can cover 99.7
% of the density in the figure, where v 6 j is supposed to appear. Thus,
the data cleaning process can consider that v 6 j is not an error. At t i =7,
the window slides and now contains raw sensor values v 3 j ,...,v 6 j .By
repeating the same process, the anomaly detector finds v 7 j resides out
of the error bound (3 σ range) in the inferred probability distribution,
and is identified as an anomaly [57].
A vast body of research work has utilized probabilistic models for
computing inferred values. The Kalman filter is perhaps one of the most
 
Search WWH ::




Custom Search