Database Reference
In-Depth Information
preference clusters is introduced which are defined
as density-based clusters in subspaces. The dimen-
sions of the subspace of a cluster are determined
by selecting directions of low variance within the
ε− neighbourhood of core objects.
of a cluster to exhibit a certain density, i.e. feature
similarity. When comparing correlation cluster-
ing with subspace and projected clustering, we
can observe, that in correlation clustering, the
clusters exist in an arbitrarily oriented subspace
rather than in an axis-parallel one. Therefore,
correlation clustering is sometimes also referred
to as generalized subspace clustering .
Correlation clustering has been successfully
applied to several application domains. For ex-
ample, customer recommendation systems are
important tools for target marketing. For the
purpose of data analysis for recommendation
systems, it is important to find homogeneous
groups of users with similar ratings in subsets of
the attributes. In addition, it is interesting to find
groups of users with correlated affinities. This
knowledge can help companies to predict customer
behaviour and thus develop future marketing
plans. In molecular biology, correlation clustering
is an important method for the analysis of several
types of data. In metabolic screening, the col-
lected data usually contain the concentrations of
certain metabolites in the blood of thousands of
patients. In such data sets, it is important to find
homogeneous groups of patients with correlated
metabolite concentrations indicating a common
metabolic disease. Thus, several metabolites can
be linearly dependent on several other metabo-
lites. Uncovering these patterns and extracting
the dependencies of these clusters is a key step
towards understanding metabolic or genetic dis-
orders and designing individual drugs. A second
example where correlation clustering is a sound
methodology for data analysis in molecular biol-
ogy is DNA micro-array data analysis. Micro-array
data comprise the expression levels of thousands
of genes in different samples such as experimental
conditions, cells or organisms. Roughly speaking,
the expression level of a gene indicates how ac-
tive this gene is. The recovering of dependencies
among different genes in certain conditions is an
important step towards a more comprehensive
understanding of the functionality of organisms
correlAtIon cluSterIng
The detection of correlations between different
features in a given data set is a very important
data mining task. High correlation of features may
result in a high degree of co-linearity or even a
perfect one, corresponding to approximate linear
dependencies between two or more attributes.
These dependencies can be arbitrarily complex,
one or more features might depend on a combina-
tion of several other features. In the data space,
dependencies of features are manifested as lines,
planes, or, generally speaking, hyper-planes ex-
hibiting a relatively high density of data objects
compared to the surrounding space. See Figure
1(b) for an example of a correlation cluster in
two-dimensional space. Knowing of correlations
is traditionally used to reduce the dimensionality
of the data set by eliminating redundant features.
However, detection of correlated features may also
help to reveal hidden causalities which are of great
interest to the domain expert. Recently, correlation
clustering has been introduced as a novel concept
of knowledge discovery in databases to detect
dependencies among features and to cluster data
objects sharing a common pattern of dependencies.
It corresponds to the marriage of two widespread
ideas: First, correlation analysis usually performed
by Principle Component Analysis (PCA) and,
second, clustering which aims at identifying local
subgroups of data objects sharing high similar-
ity. Correlation clustering groups the data set
into subsets called correlation clusters such that
the objects in the same correlation cluster are all
associated to a common hyper-plane of arbitrary
dimensionality. In addition, many algorithms for
correlation cluster analysis also require the objects
Search WWH ::




Custom Search