Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

10.12 Present conditions under which density-based clustering is more suitable than

partitioning-based clustering and hierarchical clustering. Give application examples to

support your argument.

10.13 Give an example of how specific clustering methods can be integrated , for example,

where one clustering algorithm is used as a preprocessing step for another. In addi-

tion, provide reasoning as to why the integration of two methods may sometimes lead

to improved clustering quality and efficiency.

10.14 Clustering is recognized as an important data mining task with broad applications. Give

one application example for each of the following cases:

(a) An application that uses clustering as a major data mining function.

(b) An application that uses clustering as a preprocessing tool for data preparation for

other data mining tasks.

10.15 Data cubes and multidimensional databases contain nominal, ordinal, and numeric data

in hierarchical or aggregate forms. Based on what you have learned about the clustering

methods, design a clustering method that finds clusters in large data cubes effectively

and efficiently.

10.16 Describe each of the following clustering algorithms in terms of the following crite-

ria: (1) shapes of clusters that can be determined; (2) input parameters that must be

specified; and (3) limitations.

(a) k -means

(b) k -medoids

(c) CLARA

(d) BIRCH

(e) CHAMELEON

(f) DBSCAN

10.17 Human eyes are fast and effective at judging the quality of clustering methods for

2-D data. Can you design a data visualization method that may help humans visua-

lize data clusters and judge the clustering quality for 3-D data? What about for even

higher-dimensional data?

10.18 Suppose that you are to allocate a number of automatic teller machines (ATMs) in a

given region so as to satisfy a number of constraints. Households or workplaces may

be clustered so that typically one ATM is assigned per cluster. The clustering, however,

may be constrained by two factors: (1) obstacle objects (i.e., there are bridges, rivers, and

highways that can affect ATM accessibility), and (2) additional user-specified constraints

such as that each ATM should serve at least 10,000 households. How can a clustering

algorithm such as k -means be modified for quality clustering under both constraints?

10.19 For constraint-based clustering , aside from having the minimum number of customers

in each cluster (for ATM allocation) as a constraint, there can be many other kinds of

Search WWH ::

Custom Search

Home