Databases Reference
In-Depth Information
10.12 Present conditions under which density-based clustering is more suitable than
partitioning-based clustering and hierarchical clustering. Give application examples to
support your argument.
10.13 Give an example of how specific clustering methods can be integrated , for example,
where one clustering algorithm is used as a preprocessing step for another. In addi-
tion, provide reasoning as to why the integration of two methods may sometimes lead
to improved clustering quality and efficiency.
10.14 Clustering is recognized as an important data mining task with broad applications. Give
one application example for each of the following cases:
(a) An application that uses clustering as a major data mining function.
(b) An application that uses clustering as a preprocessing tool for data preparation for
other data mining tasks.
10.15 Data cubes and multidimensional databases contain nominal, ordinal, and numeric data
in hierarchical or aggregate forms. Based on what you have learned about the clustering
methods, design a clustering method that finds clusters in large data cubes effectively
and efficiently.
10.16 Describe each of the following clustering algorithms in terms of the following crite-
ria: (1) shapes of clusters that can be determined; (2) input parameters that must be
specified; and (3) limitations.
(a) k -means
(b) k -medoids
(c) CLARA
(d) BIRCH
(e) CHAMELEON
(f) DBSCAN
10.17 Human eyes are fast and effective at judging the quality of clustering methods for
2-D data. Can you design a data visualization method that may help humans visua-
lize data clusters and judge the clustering quality for 3-D data? What about for even
higher-dimensional data?
10.18 Suppose that you are to allocate a number of automatic teller machines (ATMs) in a
given region so as to satisfy a number of constraints. Households or workplaces may
be clustered so that typically one ATM is assigned per cluster. The clustering, however,
may be constrained by two factors: (1) obstacle objects (i.e., there are bridges, rivers, and
highways that can affect ATM accessibility), and (2) additional user-specified constraints
such as that each ATM should serve at least 10,000 households. How can a clustering
algorithm such as k -means be modified for quality clustering under both constraints?
10.19 For constraint-based clustering , aside from having the minimum number of customers
in each cluster (for ATM allocation) as a constraint, there can be many other kinds of
 
Search WWH ::




Custom Search