Database Reference
In-Depth Information
These subspaces are hence expected to prove interesting for various density-based
clustering algorithms. While monotonicity of this quality criterion is not proven in
general, we do know that the core point property is anti-monotonic:
S
o is not a core point in
(16.6)
T S
: o is not a core point in
T
.
A similar approach, SURFING (SUbspaces Relevant For clusterING) [ 15 ], also
following a density-based notion of 'interestingness', assesses the variance of k -
nearest neighbor distances. -dimensional subspaces can be rated as 'interesting',
'neutral', or 'irrelevant' in comparison to (
1)-dimensional subspaces, but this
rating is not monotonous. Accordingly, the subspace search of SURFING follows the
Apriori idea of early pruning of candidates only heuristically but does not formally
implement the strictly anti-monotonic candidate elimination.
CMI (Cumulative Mutual Information) [ 66 ] is a measure to assess the correlation
among the attributes of some subspace and is used to identify subspaces that are
interesting w.r.t. a high contrast, that is, they are likely to contain different clusters.
The authors assume monotonicity of this contrast criterion to facilitate a candidate
elimination-based search starting with two dimensional subspaces. As a priority
search, generating candidates from the top m subspaces only, their algorithm is
more efficient than the Apriori search at the expense of completeness of the results.
Finally, subspaces contained in another subspace reported as a result are dropped
from the resulting set of subspaces, if the higher-dimensional subspace also has
higher contrast.
Local Subspace Search Subspace search has also been incorporated locally into
clustering algorithms. DiSH [ 1 ], similar to its predecessor HiCS
citeclu:AchBoeKriKroetal06, follows a pattern of cluster search that is different from
the Apriori-based subspace clustering idea discussed so far. Appropriate subspaces
for distance computations are learned locally for each point, then the locally adapted
(subspace-) distances and the dimensionality of the assigned subspace are used as a
combined distance measure in a global clustering schema similar to OPTICS [ 8 ]to
find hierarchies of subspaces.
For learning the most appropriate subspace for each data point both HiCS and
DiSH assign a 'subspace preference vector' to each object, based on the variance
of the neighborhood in each attribute. As such, the clustering procedure does not
make use of an efficient frequent pattern search algorithm. However, while HiCS
uses the full-dimensional neighborhood and studies the variances of the neighbor-
hoods in attribute-wise projections, DiSH starts with attribute-wise neighborhoods
and combines those neighborhoods in a bottom-up procedure. Here, an Apriori-like
search strategy is one of the suggested alternatives, employing the monotonicity of
neighborhoods in projections of the data. If
S
is a subspace of
T
, then the cardinality
of the ε -neighborhood of some object o in
T
is bound to be at most the cardinality
of the ε -neighborhood of the same object o in
S
:
S T N ε
( o ) N ε
( o )
(16.7)
Search WWH ::




Custom Search