REAL-TIME DATA ANALYTICS IN SENSOR NETWORKS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

correctness of the results they produce. Consequently, these techniques

may fail to report some of the outliers in the data.

Classification-based

A method based on Bayesian classifiers is described by Elnahrawy et

al. [34]. This is a method for modeling and learning statistical contextual

information in WSNs, which can also be applied for the task of outlier

identification. The employed model assumes that the current reading

of each sensor is only influenced by the preceding reading of the same

sensor, and the readings of its immediate neighbors. This model is then

used to predict the highest probability class of the subsequent reading. If

the probability of this class is significantly different from the probability

(according to the model) of the actual reading, then this reading is

deemed an outlier.

Rajasegarar et al. [77] propose an alternative approach that uses a

Support Vector Machine (SVM) classifier. In this case the classification

model uses only the information from the past readings of the same

sensor node, and ignores the readings from the neighboring nodes.

A drawback of the classification-based approaches is the time and

computational effort required in order to train the model that can then

be used for outlier detection. This effort can in certain cases be rather

high. Note also that for non-stationary data this effort will be continu-

ous.

Data Distribution-based

A technique for outlier detection, based on learning statistical prop-

erties of the spatio-temporal correlations of the sensor readings, is pro-

posed by Bettencourt et al. [12]. This technique is geared towards eco-

logical applications, where the sensed pheonomena evolve slowly over

time, and are spatio-temporally coherent. According to this technique,

sensors learn the distributions of differences among their own readings

(over time), as well as the distributions of differences between their read-

ings and the readings of their neighbors. Then, comparing the current

readings to these distributions, allows sensors to identify local outliers

using a significance test, and a user-specified threshold.

Subramaniam et al. [93] study the case where we wish to identify

(among all sensor readings in a sliding window) those values that have

very few near neighbors [55], namely, distance-based outliers; or those

values whose near neighborhood is significantly less dense than their

extended neighborhood [71], namely, density-based outliers. Note that

these definitions do not require any prior knowledge of the underlying

data distributions. In order to solve the problem (for both definitions

Search WWH ::

Custom Search

Home