Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

10.2 Suppose that the data mining task is to cluster points (with

.

x , y

/

representing location)

into three clusters, where the points are

A 1 .

2, 10

/

, A 2 .

2, 5

/

, A 3 .

8, 4

/

, B 1 .

5, 8

/

, B 2 .

7, 5

/

, B 3 .

6, 4

/

, C 1 .

1, 2

/

, C 2 .

4, 9

/

.

The distance function is Euclidean distance. Suppose initially we assign A 1 , B 1 , and C 1

as the center of each cluster, respectively. Use the k-means algorithm to show only

(a) The three cluster centers after the first round of execution.

(b) The final three clusters.

10.3 Use an example to show why the k -means algorithm may not find the global optimum,

that is, optimizing the within-cluster variation.

10.4 For the k -means algorithm, it is interesting to note that by choosing the initial cluster

centers carefully, we may be able to not only speed up the algorithm's convergence, but

also guarantee the quality of the final clustering. The k -means CC algorithm is a vari-

ant of k -means, which chooses the initial centers as follows. First, it selects one center

uniformly at random from the objects in the data set. Iteratively, for each object p other

than the chosen center, it chooses an object as the new center. This object is chosen at

random with probability proportional to dist

2 , where dist

is the distance from p

to the closest center that has already been chosen. The iteration continues until k centers

are selected.

Explain why this method will not only speed up the convergence of the k -means

algorithm, but also guarantee the quality of the final clustering results.

.

p

/

.

p

/

10.5 Provide the pseudocode of the object reassignment step of the PAM algorithm.

10.6 Both k-means and k-medoids algorithms can perform effective clustering.

(a) Illustrate the strength and weakness of k-means in comparison with k-medoids .

(b) Illustrate the strength and weakness of these schemes in comparison with a hierar-

chical clustering scheme (e.g., AGNES).

10.7 Prove that in DBSCAN, the density-connectedness is an equivalence relation.

10.8 Prove that in DBSCAN, for a fixed MinPts value and two neighborhood thresholds,

1 < 2 , a cluster C with respect to

1 and MinPts must be a subset of a cluster C 0 with

2 and MinPts .

10.9 Provide the pseudocode of the OPTICS algorithm.

respect to

10.10 Why is it that BIRCH encounters difficulties in finding clusters of arbitrary shape but

OPTICS does not? Propose modifications to BIRCH to help it find clusters of arbitrary

shape.

10.11 Provide the pseudocode of the step in CLIQUE that finds dense cells in all subspaces.

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home