Database Reference
In-Depth Information
are located in the neighbourhood, and an incor-
rect subspace dimension may be determined. In
contrast, if ε is chosen too small, no core objects
may be found. A hierarchical extension of 4C
called HiCO which is based on the hierarchical
density-based clustering notion of OPTICS (cf.
the background section) and PCA has also been
proposed (Achtert et al. 2006). A further exten-
sion, DiSH (Achtert et al. 2007) considers more
general, network-based relationships between
correlation clusters.
In (Tung et al. 2006) the authors propose
CURLER, a method for finding and visualiz-
ing clusters even with a nonlinear correlation.
CURLER uses the concept of micro-clusters that
are generated using a variant of EM clustering. The
micro-clusters are merged to discover correlation
clusters. The merging of micro-clusters is based
on the concept of co-sharing, i.e. the co-sharing
level of two micro-clusters corresponds to the total
amount of points which are associated to both
clusters simultaneously. From the set of all micro-
clusters, an arbitrary starting object is selected,
and, like in OPTICS or Single Link, in each step
that cluster is selected as the next micro-cluster
in the micro-cluster order which has the highest
co-sharing level to the starting object or any of
the previously placed micro-clusters. CURLER
also defines a visualization technique similar to
the reachability-plot of OPTICS. CURLER im-
proves over ORCLUS and 4C as the correlations
underlying the clusters are not necessarily linear.
Thus, also clusters like the example in Figure 1(c)
can be detected. Furthermore, as a fuzzy approach,
CURLER assumes each data object to belong
to all clusters simultaneously, but with different
probabilities for each cluster assigned. By merg-
ing several clusters according to their co-sharing
level, the algorithm on the one hand becomes less
sensitive to the predefined number K of clusters,
thus also overcoming a severe limitation of any
K-means related approach. On the other hand, the
user cannot directly derive a model describing the
correlations, since the original K models are no
longer present in the resulting clustering. There-
fore, one can say that the algorithm can handle
arbitrary non-linear correlations but is not able to
identify the type of correlation.
The most important problem of all approaches
previously presented in this section is the miss-
ing robustness with respect to noise objects and
correlation clusters which are close to each other.
All the applied techniques, iterative methods,
density-based methods and micro-clusters are to
some degree sensitive to the situation when the
correlation cannot be determined by looking at the
full-dimensional local neighbourhood of an object.
The algorithm CASH (Clustering inArbitrary Sub-
spaces based on the Hough transform) presented
in (Achtert et al. 2008) tackles this problem by
a sophisticated parameter space transformation:
Instead of clustering the points directly, the idea is
first to replace each point by the (infinite) set of all
possible, λ-dimensional planes in which the point
is contained. For low (two or three) dimensional
applications from image processing domains, this
idea is already well-known, and denoted by the
term Hough transform. In these low dimensional
domains, the infinite set of all lines or planes can
easily be discretely represented by a finite selec-
tion of some possible sample planes (for example
sampled by discrete angles). Matching planes can
easily be determined by counting of accumula-
tors, where each accumulator corresponds to one
possible plane. In higher dimensional spaces, a
sufficient discretization of all possible planes is
not possible due to the curse of dimensionality.
But the set of all possible planes passing through a
point can also be represented by a function. If the
planes are represented by Euclidean coordinates,
these functions are linear functions. If the planes
are represented by spherical coordinates (which
is advantageous because orthographic planes can-
not be represented in Euclidean coordinates) then
the functions are trigonometric (combinations of
sine and cosine functions). Points sharing a com-
mon plane are determined by those trigonometric
functions which have a common intersection at an
Search WWH ::




Custom Search