Advanced Cluster Analysis - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

subspaces. Various pruning techniques are explored to reduce the number of higher-

dimensional subspaces that need to be searched. CLIQUE is an example of a

bottom-up approach.

Top-down approaches start from the full space and search smaller and smaller sub-

spaces recursively. Top-down approaches are effective only if the locality assumption

holds, which require that the subspace of a cluster can be determined by the local

neighborhood.

Example11.10 PROCLUS, a top-down subspace approach. PROCLUS is a k -medoid-like method

that first generates k potential cluster centers for a high-dimensional data set using a

sample of the data set. It then refines the subspace clusters iteratively. In each itera-

tion, for each of the current k -medoids, PROCLUS considers the local neighborhood

of the medoid in the whole data set, and identifies a subspace for the cluster by mini-

mizing the standard deviation of the distances of the points in the neighborhood to

the medoid on each dimension. Once all the subspaces for the medoids are deter-

mined, each point in the data set is assigned to the closest medoid according to the

corresponding subspace. Clusters and possible outliers are identified. In the next iter-

ation, new medoids replace existing ones if doing so improves the clustering quality.

Correlation-BasedClusteringMethods

While subspace search methods search for clusters with a similarity that is measured

using conventional metrics like distance or density, correlation-based approaches can

further discover clusters that are defined by advanced correlation models.

Example11.11 A correlation-based approach using PCA. As an example, a PCA-based approach first

applies PCA (Principal Components Analysis; see Chapter 3) to derive a set of new,

uncorrelated dimensions, and then mine clusters in the new space or its subspaces. In

addition to PCA, other space transformations may be used, such as the Hough transform

or fractal dimensions.

For additional details on subspace search methods and correlation-based clustering

methods, please refer to the bibliographic notes (Section 11.7).

BiclusteringMethods

In some applications, we want to cluster both objects and attributes simultaneously.

The resulting clusters are known as biclusters and meet four requirements: (1) only a

small set of objects participate in a cluster; (2) a cluster only involves a small number of

attributes; (3) an object may participate in multiple clusters, or does not participate in

any cluster; and (4) an attribute may be involved in multiple clusters, or is not involved

in any cluster. Section 11.2.3 discusses biclustering in detail.

Search WWH ::

Custom Search

Home