Database Reference
In-Depth Information
Recently proposed algorithms like SURFING
(Subspaces Relevant for Clustering) (Baumgartner
et al. 2004), SCHISM (Support and Chernoff-Ho-
effding Bound-based Interesting Subspace Miner)
(Sequeira & Zaki 2004) and DUSK (Dimension-
ality-unbiased Subspace Clustering) (Assent et
al. 2007) refrain from a cluster notion based on
a monotonic density criterion, since this leads to
major problems. With increasing dimensionality,
the object density naturally decreases. Therefore,
a fixed density criterion implicitly specifies the
dimensionality of subspace clusters which can
be detected by the algorithm. Parameterization
of such algorithms is difficult. Subspace clusters
of various dimensionalities cannot be detected
in a single run of the algorithm. However, with
the fixed density criterion, also the monotonicity
property is dropped and heuristic search strate-
gies are required. SURFING and DUSK propose
criteria to rate the interestingness of subspaces for
clustering based on statistics of the data distribu-
tion which allow detecting subspace clusters of
various dimensionalities. SCHISM also employs a
variable density criterion and heuristic search but
uses a grid-based data quantization as CLIQUE.
All described algorithms apply bottom-up search,
which implies that at least parts of the subspace
clusters must be visible in the one-dimensional
subspaces.
An alternative solution to the curse of dimen-
sionality in clustering is projected clustering. In
contrast to subspace clustering, algorithms for
projected clustering assign each data object to
only one distinct cluster and determine the best
subspace, or the best projection, for this cluster.
Instead of performing a bottom-up search, most
algorithms for projected clustering start in the full
dimensional space and, similar to K-means, itera-
tively optimize some objective function. The first
approach to projected clustering is the algorithm
PROCLUS (Projected Clustering) (Aggarwal et
al. 1999). Provided with the input parameters K
and l , PROCLUS returns a partitioning of the data
into K clusters having an average dimensionality
l . Objects not fitting well to any of the clusters
are assigned to noise. To achieve comparability
of distances among objects assigned to subspace
clusters of different subspace dimensionality, the
Manhattan distance is normalized by the subspace
dimensionality. In the initialization stage, PRO-
CLUS selects initial medoids as cluster representa-
tives from the data objects. In the iterative phase,
the set of medoids and their associated subspaces
are refined using a greedy hill-climbing technique.
After this iterative search, an additional pass over
the data is performed for refinement of clusters,
medoids and associated subspaces.
Procopiuc et al. (2002) introduce an alterna-
tive notion of projected clusters, called optimal
projective clusters, together with an algorithm
called DOC (Density-based Optimal Projective
Clustering) to find such clusters. An optimal pro-
jective cluster is defined using two parameters:
α specifying a fraction of the data objects and ω
specifying the width of a hypercube. An optimal
projected cluster is defined as a set of points C as-
sociated with a subspace of dimensions D such that
C contains more than α% points of the objects and
the projection of C onto the subspace spanned by
D is contained in a hyper-cube of width ω. Based
on Monte-Carlo sampling, the algorithm DOC
returns approximations of the optimal projected
clusters. DOC does not require the user to specify
K, but the parameterization remains difficult
for the same reasons as discussed for subspace
clustering with a fixed density criterion. In ad-
dition, the cluster notion of DOC does not care
about the data distribution inside the hypercube
representing a cluster. Not always the hypercube
contains exactly one cluster, it may contain several
clusters, noise points and empty space as well.
The algorithm PreDeCon (Subspace Preference
Weighted Density- connected Clustering) (Böhm
et al. 2004) partially avoids the problems of dif-
ficult parameterization and undesired behavior
in the presence of noise by extending the basic
concepts of density-based clustering to projected
clustering. In this approach, the notion of subspace
Search WWH ::




Custom Search