DISTRIBUTED DATA MINING IN SENSOR NETWORKS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

The algorithms usually have full control over where to place the data.

Moreover, the cost model for communication only takes into account

the time needed for transferral, but not the consumption of energy. In

WSNs, however, there is much less control over the data partitioning.

For example, application constraints may prohibit certain combinations

of sensor placements. Also, the most limiting factor in WSNs is energy,

not necessarily time. For this reason, algorithms developed according to

the parallel computing paradigm are usually not well-suited for WSNs.

In comparison to the large number of algorithms that form clusters of

sensor nodes, currently only few algorithms exist that eciently cluster

the sensor measurements themselves.

Generally, clustering algorithms situated in and developed for peer-

to-peer networks are good candidates for use in WSNs, especially those

that mostly rely on local computations and communication with a lim-

ited number of nearest neighbor nodes only. Datta et al. [18] have

introduced two distributed variants of the k -Means algorithm. The first

variant, LSP2P (Local Synchronized-Based P2P) k -Means, is based on

a more general local algorithm for mining data streams in distributed

systems [58]. It carries out repeated iterations of a modified k -Means

at each local node and collects newly calculated centroids and cluster

counts only from its immediate neighbors to produce the centroids for

the next iteration. Nodes terminate if these new centroids don't differ

substantially from the old ones. The algorithm requires no global syn-

chronization and can be extended to a dynamic environment, in which

nodes enter and leave the network. Communication costs are shown

to be independent of the number of observations to cluster and the to-

tal amount of communication is O ( nI ( K + L )), where n is the number

of nodes, I the number of iterations, K the number of clusters and L

the maximum number of neighbors. It was shown empirically that the

algorithm yields similar accuracy as a centralized version of k -Means,

however, proving convergence or bounds on the accuracy appears to be

a hard problem. The second variant, USP2P (P2P k -Means Clustering

Based on Uniform Node Sampling), improves the work by Bandyopad-

hyay et al. [6] and selects s nodes randomly uniformly by a random walk

strategy [19] to update centroids in each iteration. For a static network,

USP2P provides an accuracy guarantee. Communication costs are up-

per bounded by O ( Ms log( n )), where M denotes the maximum allowed

number of iterations by source node, s is the random walk length and n

is the number of nodes.

Nowak [44] has introduced DEM, a distributed expectation maximiza-

tion algorithm for clustering data from a Gaussian mixture distribution,

with a particular focus on sensor networks. DEM utilizes an incremen-

Search WWH ::

Custom Search

Home