Database Reference
In-Depth Information
Table 16.1 Translating
frequent pattern mining
concepts to subspace
clustering
Frequent pattern mining
Subspace clustering
Item
Dimension (attribute)
Itemset
Subspace (set of attributes)
Frequent itemset
Subspace (unit) containing cluster
Consequently, a set of points cannot form a cluster in some space
T
, if it does not
also form a cluster in every subspace of
. Or, formulated as anti-monotone property
that we can use to prune candidate subspaces:
T
S
does not contain any cluster
⇒∀
superspaces
T S
:
(16.3)
T
cannot contain a cluster either .
As a result, an algorithm for subspace clustering can identify all clusters in all
1-dimensional subspaces, continue to look for clusters in only those 2-dimensional
subspaces that have a 1-dimensional subspace containing some cluster, and so on,
following the candidate-pruning heuristic of the Apriori algorithm. Hence we see
that the 'items' of Apriori translate to dimensions, 'itemsets' translate to subspaces,
and 'frequent itemset' according to some frequency threshold translates to 'subspace
contains some cluster' according to some clustering criterion. See Table 16.1 for a
summary of this translation. This transfer of concepts requires the anti-monotonicity
to hold for the clustering criterion used.
Note that the monotonicity does not hold in general for arbitrary cluster paradigms,
but instead depends on the particular cluster model used. The example used here
(monotonicity of density-based clusters, Fig. 16.4 ) has been proven for the subspace
clustering approach SUBCLU [ 48 ]. However, the very idea of using a monotonicity
for some cluster criterion has been used for different clustering models several times,
following the seminal approach of CLIQUE [ 7 ]. We detail the specific adaptations
for different clustering models in the next section.
Subspaces as Patterns In the second main variant, the setup is slightly modified,
and the goal is to identify subspaces as a prerequisite for the final clustering result.
These subspaces can be used in quite different ways in connection with clustering
algorithms. For example, after identification of subspaces, traditional clustering
algorithms are applied to find clusters within these subspaces, or distance measures
can be adapted to these subspaces in the actual clustering procedure, or clusters and
corresponding subspaces are refined iteratively. As such, in contrast to the setting
above, here one does not identify whether subspaces are 'interesting' by the clusters
they contain (which is specific to a particular clustering model), but rather defines
'interesting' more generally, for example in terms of how strongly these attributes
interact.
In subspace search, just as in subspace clustering, the 'items' and 'itemsets' con-
cepts from frequent pattern mining translate nicely to 'dimension' and 'subspace',
respectively. The notion of a 'frequent itemset' according to some frequency thresh-
old translates different here, namely to 'interesting subspace' according to some
 
Search WWH ::




Custom Search