Databases Reference
In-Depth Information
10.2 Suppose that the data mining task is to cluster points (with
.
x , y
/
representing location)
into three clusters, where the points are
A 1 .
2, 10
/
, A 2 .
2, 5
/
, A 3 .
8, 4
/
, B 1 .
5, 8
/
, B 2 .
7, 5
/
, B 3 .
6, 4
/
, C 1 .
1, 2
/
, C 2 .
4, 9
/
.
The distance function is Euclidean distance. Suppose initially we assign A 1 , B 1 , and C 1
as the center of each cluster, respectively. Use the k-means algorithm to show only
(a) The three cluster centers after the first round of execution.
(b) The final three clusters.
10.3 Use an example to show why the k -means algorithm may not find the global optimum,
that is, optimizing the within-cluster variation.
10.4 For the k -means algorithm, it is interesting to note that by choosing the initial cluster
centers carefully, we may be able to not only speed up the algorithm's convergence, but
also guarantee the quality of the final clustering. The k -means CC algorithm is a vari-
ant of k -means, which chooses the initial centers as follows. First, it selects one center
uniformly at random from the objects in the data set. Iteratively, for each object p other
than the chosen center, it chooses an object as the new center. This object is chosen at
random with probability proportional to dist
2 , where dist
is the distance from p
to the closest center that has already been chosen. The iteration continues until k centers
are selected.
Explain why this method will not only speed up the convergence of the k -means
algorithm, but also guarantee the quality of the final clustering results.
.
p
/
.
p
/
10.5 Provide the pseudocode of the object reassignment step of the PAM algorithm.
10.6 Both k-means and k-medoids algorithms can perform effective clustering.
(a) Illustrate the strength and weakness of k-means in comparison with k-medoids .
(b) Illustrate the strength and weakness of these schemes in comparison with a hierar-
chical clustering scheme (e.g., AGNES).
10.7 Prove that in DBSCAN, the density-connectedness is an equivalence relation.
10.8 Prove that in DBSCAN, for a fixed MinPts value and two neighborhood thresholds,
1 < 2 , a cluster C with respect to
1 and MinPts must be a subset of a cluster C 0 with
2 and MinPts .
10.9 Provide the pseudocode of the OPTICS algorithm.
respect to
10.10 Why is it that BIRCH encounters difficulties in finding clusters of arbitrary shape but
OPTICS does not? Propose modifications to BIRCH to help it find clusters of arbitrary
shape.
10.11 Provide the pseudocode of the step in CLIQUE that finds dense cells in all subspaces.
 
Search WWH ::




Custom Search