Database Reference
In-Depth Information
bors and the data it has received. Following some rounds of peer-to-peer
communications, all the nodes in the network converge to the final list of
global outliers. This technique guarantees that it will correctly identify
all outliers, but only under the assumptions that each node has accurate
knowledge of its nearest neighbors, the communications are reliable, and
that the data remains static long enough for the algorithm to converge.
In a similar setting, Zhang et al. [109] describe a technique for iden-
tification of global outliers, where outliers are defined as the n points
with the largest distance to their k th nearest neighbor. This technique
assumes the existence of an aggregation tree, which is used as the com-
munication structure among the nodes in the network. The nodes use
the aggregation tree to send local outliers and supporting information to
their parents, with the root node eventually collecting all the informa-
tion. At this point the sink is able to calculate the top- n global outlier
candidates, which transmits back to all the nodes in the network for
verification. If corrections need to be made, these have to be sent to
the sink, which will then adjust the candidate outlier list and repeat the
verification process. The end result is guaranteed to be correct as long
as the network topology does not change, and the algorithm converges
to the solution faster than the data gets updated (which implies the need
for a rather slow update rate).
A subsequent study [85] takes a more pragmatic approach, removing
the assumptions mentioned in the previous approaches. The goal is still
to find global outliers. An outlier is defined as a point whose distance
from its k th nearest neighbor is more than a distance threshold d ;or
alternatively, as a point p , such that there exist no more than n other
points with distance to their k th nearest neighbors larger than the dis-
tance of p to its k th nearest neighbor. This approach is based on the use
of an equi-width histogram that can effectively aggregate and summarize
the sensor data readings. The histogram is built in the sink, after the
sink agrees with all the sensor nodes on the boundaries of the histogram
and its buckets. The histogram is then used by the sink in order to
prune the search space of outliers, by eliminating all points that can-
not possibly be outliers, as well as identifying points that are certainly
outliers. For the points for which no definite answer can be given, the
sink will explicitly ask the sensor nodes in the network, in an additional
round of computations.
Burdakis et al. [17] present an outlier detection framework that can
provide hard guarantees on the results. It is based on the Geometric Ap-
proach [81], which allows the development of much more ecient meth-
ods (in terms of communication cost) than the ones presented above.
The Geometric Approach enables the monitoring of complex (poten-
Search WWH ::




Custom Search