Database Reference
In-Depth Information
Figure 8. Parameter-free clustering. Coding scheme for cluster objects of the RIC algorithm. In addi-
tion to the data, type and parameters of the PDF need to be coded for each cluster. (Figure from Böhm
et al. 2006)
compression can be achieved if the value of a
coordinate is encoded with a bit string of length
anti-proportional to its likelihood. In a first step,
RIC removes noise objects from the initial clusters,
and then merges clusters if this allows for more
effective data compression. The algorithm can
operate with arbitrary data distributions which
can be described by PDFs. However, a fixed set
of PDFs needs to be selected in advance. The
algorithm OCI (Outlier-robust Clustering using
Independent Components) (Böhm et al. 2008)
provides parameter-free clustering of noisy data
and allows detecting non-Gaussian clusters with
non-orthogonal major directions as the example
in Figure 1(d). Technically this is achieved by
defining a very general cluster notion based on
the Exponential Power Distribution (EPD) and
by integrating Independent Component Analysis
(ICA) into clustering. The EPD includes a wide
range of symmetric distribution functions, for
example Gaussian, Laplacian and uniform distri-
butions and an infinite number of hybrid types in
between. Beyond correlations detected by PCA
which correspond to correlation clusters with
orthogonal major directions, ICA allows to detect
general statistical dependencies in data.
future trendS
In this section, we point out some further trends
from which we believe that they will attract
even more attention in the future; one is cluster-
ing of uncertain data. Uncertainty is a natural
element in many applications, for example due
to the limited resolution and accuracy of data
acquisition techniques or due to the application
of aggregated features. Sometimes uncertainty is
even willingly introduced, for example by adding
small perturbations to the data to mask sensi-
tive features in privacy-preserving data mining
(Aggarwal 2007). Some recent papers focus on
Search WWH ::




Custom Search