Database Reference
In-Depth Information
correctness of the results they produce. Consequently, these techniques
may fail to report some of the outliers in the data.
Classification-based
A method based on Bayesian classifiers is described by Elnahrawy et
al. [34]. This is a method for modeling and learning statistical contextual
information in WSNs, which can also be applied for the task of outlier
identification. The employed model assumes that the current reading
of each sensor is only influenced by the preceding reading of the same
sensor, and the readings of its immediate neighbors. This model is then
used to predict the highest probability class of the subsequent reading. If
the probability of this class is significantly different from the probability
(according to the model) of the actual reading, then this reading is
deemed an outlier.
Rajasegarar et al. [77] propose an alternative approach that uses a
Support Vector Machine (SVM) classifier. In this case the classification
model uses only the information from the past readings of the same
sensor node, and ignores the readings from the neighboring nodes.
A drawback of the classification-based approaches is the time and
computational effort required in order to train the model that can then
be used for outlier detection. This effort can in certain cases be rather
high. Note also that for non-stationary data this effort will be continu-
ous.
Data Distribution-based
A technique for outlier detection, based on learning statistical prop-
erties of the spatio-temporal correlations of the sensor readings, is pro-
posed by Bettencourt et al. [12]. This technique is geared towards eco-
logical applications, where the sensed pheonomena evolve slowly over
time, and are spatio-temporally coherent. According to this technique,
sensors learn the distributions of differences among their own readings
(over time), as well as the distributions of differences between their read-
ings and the readings of their neighbors. Then, comparing the current
readings to these distributions, allows sensors to identify local outliers
using a significance test, and a user-specified threshold.
Subramaniam et al. [93] study the case where we wish to identify
(among all sensor readings in a sliding window) those values that have
very few near neighbors [55], namely, distance-based outliers; or those
values whose near neighborhood is significantly less dense than their
extended neighborhood [71], namely, density-based outliers. Note that
these definitions do not require any prior knowledge of the underlying
data distributions. In order to solve the problem (for both definitions
Search WWH ::




Custom Search