Database Reference
In-Depth Information
tially non-linear) functions computed over the average of vectors the
describe the local behavior at each sensor node, and the handling of dif-
ferent similarity functions (useful for the outlier detection task) in the
distributed setting of a WSN: each sensor is assigned a zone, which
is locally monitored, and if no sensor identifies a threshold violation in
their corresponding zones, then the overall monitored function will not
have exceeded the threshold either. Under the proposed framework, we
can identify sensor nodes that involve sensed data values (either the re-
cent history of readings, or the vector of the currently sensed values)
that are not similar to the corresponding values of other similar nodes
in the network. Several different similarity measures can be eciently
supported, including L 1 , L 2 , L , cosine similarity, extended Jaccard
coecient, and correlation coecient.
3.4 Processing Uncertain Data Series
In several different domains, such as manufacturing plants and en-
gineering facilities, sensor networks are being deployed to ensure e-
ciency, product quality and safety [57]: unexpected vibration patterns
in production machines, or changes in the composition of chemicals in
industrial processes, are used to identify in advance possible failures, sug-
gesting repairs or replacements. However, sensor readings are inherently
imprecise because of the noise introduced by the equipment itself [18].
Previous work has shown that treating value uncertainty as a first class
citizen can lead to better results in terms of quality and eciency [57,
91, 94, 96]. Since value uncertainty is inherent in WSN data, in the
following paragraphs we discuss some recent works on processing data
series with uncertain values. The focus of these works is on similarity
matching, which serves as the basis for developing various more complex
analysis and mining algorithms (e.g., classification, clustering, outlier
detection, etc.).
Two main approaches have emerged for modeling uncertain data se-
ries. In the first, a Probability Density Function (PDF) over the uncer-
tain values is estimated by using some a priori knowledge [112, 105, 83].
In the second, the uncertain data distribution is summarized by repeated
measurements (i.e., samples) [8]. We discuss those in more detail below.
3.4.1 Similarity Matching for Uncertain Data Series.
Formally, an uncertain data series T is defined as a sequence of random
variables <t 1 ,t 2 ,...,t n > ,where t i is the random variable modeling the
real valued number at timestamp i . All the three models we review and
compare fit under this general definition.
Search WWH ::




Custom Search