DISTRIBUTED DATA MINING IN SENSOR NETWORKS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

algorithms for WSN. These are data and node clustering, data classifi-

cation and outlier detection in sensor networks.

The rest of this chapter is organized as follows. Section 2 discusses

several WSN clustering strategies including both node and data cluster-

ing. In the next section (Section 3) we discuss classification techniques

followed by several outlier detection techniques in Section 4. We con-

clude the chapter in Section 5.

2. Clustering in Wireless Sensor Networks

Cluster analysis is the unsupervised learning task of dividing a set

of objects into groups ( clusters ), such that a given quality criterion is

maximized. More formally, given a set of objects X =

{

x 1 ,...,x n }

,we

search for a clustering

C

=

{

C 1 ,...,C k }

with C i ⊆

X for i =1 ,...,k

0 . Partitional clustering

that maximizes a quality function q :2 X

→ R

algorithms yield disjunct clusters, i.e. C i ∩

= j , while

the clusters produced by non-partitional algorithms may overlap. More-

over, hierarchical clustering algorithms can subdivide clusters further,

resulting in increasingly more detailed groupings of the given objects.

Instead of creating new quality functions and algorithms for each par-

ticular application, clustering algorithms usually try to achieve more

general parameterized objectives. For example, an often stated objec-

tive is that objects in the same cluster should be more similar to each

other than objects from different clusters. With d : X

C j =

∅

for i

+

0 being

a given dissimilarity function between objects, this criterion can be for-

malized as minimizing the intra-cluster variance , also called the sum of

squares within (SSW), over all possible clusterings for a fixed number of

clusters k [32]:

×

X

→ R

k

C ∗ )= 1

2

d ( x 1 ,x 2 )w th C i ∈C ∗

min

C ∗

SSW (

i =1

x 1 ∈C i

x 2 ∈C i

For algorithms that try to minimize this criterion, like the well-known

k-Means algorithm [39], an application specific clustering can then be

obtained by choosing an appropriate dissimilarity measure. Density-

based algorithms, like DBSCAN [24], form clusters of points that have a

high spatial density. Subspace clustering algorithms specialize on finding

clusters in lower dimensional subspaces of the whole data space. For

a good overview of different types of clustering algorithms and their

objectives, see Han [30] and Kriegel et al. [38].

The aforementioned algorithms assume the whole data set to be in

main memory or at least they ignore the time needed for accessing the

Managing and Mining Sensor Data

Search WWH ::

Custom Search

Home