Databases Reference
In-Depth Information
kinds of join indices based on the computation of the shortest paths: (1) VV indices ,
for any pair of obstacle vertices, and (2) MV indices , for any pair of microcluster and
obstacle vertex. Use of the indices helps further optimize the overall performance.
Using such precomputation and optimization strategies, the distance between any
two points (at the granularity level of a microcluster) can be computed efficiently.
Thus, the clustering process can be performed in a manner similar to a typical efficient
k -medoids algorithm, such as CLARANS, and achieve good clustering quality for large
data sets.
11.5 Summary
In conventional cluster analysis, an object is assigned to one cluster exclusively. How-
ever, in some applications, there is a need to assign an object to one or more clusters
in a fuzzy or probabilistic way. Fuzzy clustering and probabilistic model-based clus-
tering allow an object to belong to one or more clusters. A partition matrix records
the membership degree of objects belonging to clusters.
Probabilistic model-based clustering assumes that a cluster is a parameterized dis-
tribution. Using the data to be clustered as the observed samples, we can estimate the
parameters of the clusters.
A mixture model assumes that a set of observed objects is a mixture of instances from
multiple probabilistic clusters. Conceptually, each observed object is generated inde-
pendently by first choosing a probabilistic cluster according to the probabilities of the
clusters, and then choosing a sample according to the probability density function of
the chosen cluster.
An expectation-maximization algorithm is a framework for approaching maximum
likelihood or maximum a posteriori estimates of parameters in statistical models.
Expectation-maximization algorithms can be used to compute fuzzy clustering and
probabilistic model-based clustering.
High-dimensional data pose several challenges for cluster analysis, including how to
model high-dimensional clusters and how to search for such clusters.
There are two major categories of clustering methods for high-dimensional data:
subspace clustering methods and dimensionality reduction methods. Subspace
clustering methods search for clusters in subspaces of the original space. Exam-
ples include subspace search methods , correlation-based clustering methods , and
biclustering methods . Dimensionality reduction methods create a new space of
lower dimensionality and search for clusters there.
Biclustering methods cluster objects and attributes simultaneously. Types of biclus-
ters include biclusters with constant values , constant values on rows/columns ,
coherent values , and coherent evolutions on rows/columns . Two major types of
biclustering methods are optimization-based methods and enumeration methods .
 
Search WWH ::




Custom Search