Databases Reference
In-Depth Information
subspaces. Various pruning techniques are explored to reduce the number of higher-
dimensional subspaces that need to be searched. CLIQUE is an example of a
bottom-up approach.
Top-down approaches start from the full space and search smaller and smaller sub-
spaces recursively. Top-down approaches are effective only if the locality assumption
holds, which require that the subspace of a cluster can be determined by the local
neighborhood.
Example11.10 PROCLUS, a top-down subspace approach. PROCLUS is a k -medoid-like method
that first generates k potential cluster centers for a high-dimensional data set using a
sample of the data set. It then refines the subspace clusters iteratively. In each itera-
tion, for each of the current k -medoids, PROCLUS considers the local neighborhood
of the medoid in the whole data set, and identifies a subspace for the cluster by mini-
mizing the standard deviation of the distances of the points in the neighborhood to
the medoid on each dimension. Once all the subspaces for the medoids are deter-
mined, each point in the data set is assigned to the closest medoid according to the
corresponding subspace. Clusters and possible outliers are identified. In the next iter-
ation, new medoids replace existing ones if doing so improves the clustering quality.
Correlation-BasedClusteringMethods
While subspace search methods search for clusters with a similarity that is measured
using conventional metrics like distance or density, correlation-based approaches can
further discover clusters that are defined by advanced correlation models.
Example11.11 A correlation-based approach using PCA. As an example, a PCA-based approach first
applies PCA (Principal Components Analysis; see Chapter 3) to derive a set of new,
uncorrelated dimensions, and then mine clusters in the new space or its subspaces. In
addition to PCA, other space transformations may be used, such as the Hough transform
or fractal dimensions.
For additional details on subspace search methods and correlation-based clustering
methods, please refer to the bibliographic notes (Section 11.7).
BiclusteringMethods
In some applications, we want to cluster both objects and attributes simultaneously.
The resulting clusters are known as biclusters and meet four requirements: (1) only a
small set of objects participate in a cluster; (2) a cluster only involves a small number of
attributes; (3) an object may participate in multiple clusters, or does not participate in
any cluster; and (4) an attribute may be involved in multiple clusters, or is not involved
in any cluster. Section 11.2.3 discusses biclustering in detail.
 
Search WWH ::




Custom Search