Database Reference
In-Depth Information
The algorithms usually have full control over where to place the data.
Moreover, the cost model for communication only takes into account
the time needed for transferral, but not the consumption of energy. In
WSNs, however, there is much less control over the data partitioning.
For example, application constraints may prohibit certain combinations
of sensor placements. Also, the most limiting factor in WSNs is energy,
not necessarily time. For this reason, algorithms developed according to
the parallel computing paradigm are usually not well-suited for WSNs.
In comparison to the large number of algorithms that form clusters of
sensor nodes, currently only few algorithms exist that eciently cluster
the sensor measurements themselves.
Generally, clustering algorithms situated in and developed for peer-
to-peer networks are good candidates for use in WSNs, especially those
that mostly rely on local computations and communication with a lim-
ited number of nearest neighbor nodes only. Datta et al. [18] have
introduced two distributed variants of the k -Means algorithm. The first
variant, LSP2P (Local Synchronized-Based P2P) k -Means, is based on
a more general local algorithm for mining data streams in distributed
systems [58]. It carries out repeated iterations of a modified k -Means
at each local node and collects newly calculated centroids and cluster
counts only from its immediate neighbors to produce the centroids for
the next iteration. Nodes terminate if these new centroids don't differ
substantially from the old ones. The algorithm requires no global syn-
chronization and can be extended to a dynamic environment, in which
nodes enter and leave the network. Communication costs are shown
to be independent of the number of observations to cluster and the to-
tal amount of communication is O ( nI ( K + L )), where n is the number
of nodes, I the number of iterations, K the number of clusters and L
the maximum number of neighbors. It was shown empirically that the
algorithm yields similar accuracy as a centralized version of k -Means,
however, proving convergence or bounds on the accuracy appears to be
a hard problem. The second variant, USP2P (P2P k -Means Clustering
Based on Uniform Node Sampling), improves the work by Bandyopad-
hyay et al. [6] and selects s nodes randomly uniformly by a random walk
strategy [19] to update centroids in each iteration. For a static network,
USP2P provides an accuracy guarantee. Communication costs are up-
per bounded by O ( Ms log( n )), where M denotes the maximum allowed
number of iterations by source node, s is the random walk length and n
is the number of nodes.
Nowak [44] has introduced DEM, a distributed expectation maximiza-
tion algorithm for clustering data from a Gaussian mixture distribution,
with a particular focus on sensor networks. DEM utilizes an incremen-
Search WWH ::




Custom Search