Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

applications such as grouping Web transactions [ 135 ]. Subsequently, a very large

number of methods have been designed for clustering high-dimensional data with

the use of pattern-based methods. A detailed discussion of the connections between

such high dimensional clustering algorithms and the frequent pattern mining problem

may be found in the survey article [ 106 ] and in chapter on high dimensional data in [ 4 ].

A second problem is on using pattern mining methods for clustering discrete

attributes such as the case of biological data. Clusters can be considered as an or-

thogonal representation of the localized associations, as is the case for all subspace

clustering methods. Such a technique for finding localized associations and clusters

simultaneously is discussed in [ 9 ]. In this work, it is shown that localized associa-

tions can be enhanced, when local regions of the data are explored simultaneously

with the association analysis process. At the same time, the clustering process is

enhanced as well. This is also the general principle in many clustering methods such

as matrix factorization and co-clustering [ 4 ]. Biological data is often represented as

a sequence of discrete values corresponding to the amino-acids or the DNA/RNA

bases. The sequences are usually too long to be clustered purely by similarity com-

putations alone. Therefore, the use of pattern or motif-mining can be very useful in

these cases. An example of a sequence-based clustering approach is the CLUSEQ

method [ 136 ]. A common class of algorithms in this context is those of biclustering,

in which clusters are constructed from frequent patterns in biological data [ 93 , 99 ].

An excellent survey on biclustering methods may be found in [ 93 ]. The problem

of motif discovery is very closely related to that of clustering in such domains. A

discussion of different methods which connect the frequent pattern mining problem

to the clustering problem in the context of biological data may be found in [ 4 ].

4

Frequent Patterns for Classification

The problem of data classification is closely related to that of frequent pattern mining,

particularly in the context of rule-based methods . A classification rule is a condition

of the form:

A 1 =

a 1 , A 2 =

a 2 ⇒

C

=

c

In the case, the left hand side of the rule implies that attributes A 1 and A 2 should take

on values a 1 and a 2 respectively, and the right hand side implies that the class value

should be c . The training phase creates a set of rules from the labeled data, whereas

the testing phase determines the relevant (or fired ) rules, for which the left-hand side

of the rule matches the test instance. The final class label for the test instance is

determined as a carefully designed combination of the class labels on the right-hand

side of the fired rules. In addition, a default (or catch-all) label may be defined, if no

rules are fired by a test instance, in order to ensure full coverage.

Since classification rules are of a very similar form as association rules, it is

possible to determine relevant patterns from the data with the use of association

rule mining techniques. The main goal is to ensure that the patterns are sufficiently

Frequent Pattern Mining

Search WWH ::

Custom Search

Home