Frequent Pattern Mining Algorithms for Data Clustering - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 16.1 Translating

frequent pattern mining

concepts to subspace

clustering

Frequent pattern mining

Subspace clustering

Item

Dimension (attribute)

Itemset

Subspace (set of attributes)

Frequent itemset

Subspace (unit) containing cluster

Consequently, a set of points cannot form a cluster in some space

T

, if it does not

also form a cluster in every subspace of

. Or, formulated as anti-monotone property

that we can use to prune candidate subspaces:

T

S

does not contain any cluster

⇒∀

superspaces

T ⊇ S

:

(16.3)

T

cannot contain a cluster either .

As a result, an algorithm for subspace clustering can identify all clusters in all

1-dimensional subspaces, continue to look for clusters in only those 2-dimensional

subspaces that have a 1-dimensional subspace containing some cluster, and so on,

following the candidate-pruning heuristic of the Apriori algorithm. Hence we see

that the 'items' of Apriori translate to dimensions, 'itemsets' translate to subspaces,

and 'frequent itemset' according to some frequency threshold translates to 'subspace

contains some cluster' according to some clustering criterion. See Table 16.1 for a

summary of this translation. This transfer of concepts requires the anti-monotonicity

to hold for the clustering criterion used.

Note that the monotonicity does not hold in general for arbitrary cluster paradigms,

but instead depends on the particular cluster model used. The example used here

(monotonicity of density-based clusters, Fig. 16.4 ) has been proven for the subspace

clustering approach SUBCLU [ 48 ]. However, the very idea of using a monotonicity

for some cluster criterion has been used for different clustering models several times,

following the seminal approach of CLIQUE [ 7 ]. We detail the specific adaptations

for different clustering models in the next section.

Subspaces as Patterns In the second main variant, the setup is slightly modified,

and the goal is to identify subspaces as a prerequisite for the final clustering result.

These subspaces can be used in quite different ways in connection with clustering

algorithms. For example, after identification of subspaces, traditional clustering

algorithms are applied to find clusters within these subspaces, or distance measures

can be adapted to these subspaces in the actual clustering procedure, or clusters and

corresponding subspaces are refined iteratively. As such, in contrast to the setting

above, here one does not identify whether subspaces are 'interesting' by the clusters

they contain (which is specific to a particular clustering model), but rather defines

'interesting' more generally, for example in terms of how strongly these attributes

interact.

In subspace search, just as in subspace clustering, the 'items' and 'itemsets' con-

cepts from frequent pattern mining translate nicely to 'dimension' and 'subspace',

respectively. The notion of a 'frequent itemset' according to some frequency thresh-

old translates different here, namely to 'interesting subspace' according to some

Frequent Pattern Mining

Search WWH ::

Custom Search

Home