Advanced Cluster Analysis - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

11

AdvancedClusterAnalysis

Youlearnedthefundamentals of cluster analysis in Chapter 10. In this chapter, we discuss

advanced topics of cluster analysis. Specifically, we investigate four major perspectives:

Probabilistic model-based clustering : Section 11.1 introduces a general framework

and a method for deriving clusters where each object is assigned a probability of

belonging to a cluster. Probabilistic model-based clustering is widely used in many

data mining applications such as text mining.

Clustering high-dimensional data : When the dimensionality is high, conventional

distance measures can be dominated by noise. Section 11.2 introduces fundamental

methods for cluster analysis on high-dimensional data.

Clustering graph and network data : Graph and network data are increasingly pop-

ular in applications such as online social networks, the World Wide Web, and digital

libraries. In Section 11.3, you will study the key issues in clustering graph and

network data, including similarity measurement and clustering methods.

Clustering with constraints : In our discussion so far, we do not assume any con-

straints in clustering. In some applications, however, various constraints may exist.

These constraints may rise from background knowledge or spatial distribution of

the objects. You will learn how to conduct cluster analysis with different kinds of

constraints in Section 11.4.

By the end of this chapter, you will have a good grasp of the issues and techniques

regarding advanced cluster analysis.

11.1 ProbabilisticModel-BasedClustering

In all the cluster analysis methods we have discussed so far, each data object can be

assigned to only one of a number of clusters. This cluster assignment rule is required

in some applications such as assigning customers to marketing managers. However,

Search WWH ::

Custom Search

Home