Advanced Cluster Analysis - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

kinds of join indices based on the computation of the shortest paths: (1) VV indices ,

for any pair of obstacle vertices, and (2) MV indices , for any pair of microcluster and

obstacle vertex. Use of the indices helps further optimize the overall performance.

Using such precomputation and optimization strategies, the distance between any

two points (at the granularity level of a microcluster) can be computed efficiently.

Thus, the clustering process can be performed in a manner similar to a typical efficient

k -medoids algorithm, such as CLARANS, and achieve good clustering quality for large

data sets.

11.5 Summary

In conventional cluster analysis, an object is assigned to one cluster exclusively. How-

ever, in some applications, there is a need to assign an object to one or more clusters

in a fuzzy or probabilistic way. Fuzzy clustering and probabilistic model-based clus-

tering allow an object to belong to one or more clusters. A partition matrix records

the membership degree of objects belonging to clusters.

Probabilistic model-based clustering assumes that a cluster is a parameterized dis-

tribution. Using the data to be clustered as the observed samples, we can estimate the

parameters of the clusters.

A mixture model assumes that a set of observed objects is a mixture of instances from

multiple probabilistic clusters. Conceptually, each observed object is generated inde-

pendently by first choosing a probabilistic cluster according to the probabilities of the

clusters, and then choosing a sample according to the probability density function of

the chosen cluster.

An expectation-maximization algorithm is a framework for approaching maximum

likelihood or maximum a posteriori estimates of parameters in statistical models.

Expectation-maximization algorithms can be used to compute fuzzy clustering and

probabilistic model-based clustering.

High-dimensional data pose several challenges for cluster analysis, including how to

model high-dimensional clusters and how to search for such clusters.

There are two major categories of clustering methods for high-dimensional data:

subspace clustering methods and dimensionality reduction methods. Subspace

clustering methods search for clusters in subspaces of the original space. Exam-

ples include subspace search methods , correlation-based clustering methods , and

biclustering methods . Dimensionality reduction methods create a new space of

lower dimensionality and search for clusters there.

Biclustering methods cluster objects and attributes simultaneously. Types of biclus-

ters include biclusters with constant values , constant values on rows/columns ,

coherent values , and coherent evolutions on rows/columns . Two major types of

biclustering methods are optimization-based methods and enumeration methods .

Search WWH ::

Custom Search

Home