Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

discriminative for classification, and the support criterion does not become too dom-

inant in the rule selection process. The earliest work on the connections between

classification and association rule mining was provided in [ 18 ]. Subsequently, one

of the most popular methods for classification based on associations was the CBA (or

Classification Based on Associations ) method proposed in [ 87 ]. This method is also

available as a practical software package [ 147 ]. Subsequently, another technique for

classification on the basis of the FP-Growth method for association rule mining was

the CMAR method [ 77 ]. Some techniques focus more directly on finding discrimi-

native patterns, with a special focus on the discriminative power of the patterns with

respect to the class labels. Discriminative frequent pattern mining methods, which

are particularly tailored to classification are discussed in [ 33 ]. Such methods have

also been used for software bug detection [ 90 ]. Methods for using discriminative

frequent patterns in order to create decision trees are discussed in [ 49 ].

Such techniques have also been extended to other data domains. For examples

methods for classification of structural data and graphs with the use of rule-based pat-

terns are discussed in [ 140 ]. In these methods, discriminative subtrees and subgraphs

are discovered from the underlying structured data, and are used for the purposes of

classification. Some methods have also been designed for constructing classification

rules from spatio-temporal data, in order to determine anomalies in the form of rare

classes [ 82 ]. Rule-based methods have also been used in order to classify strings

with the use of the wavelet representation [ 1 ]. The idea is that the wavelets provide

a multi-granularity representation of the data on which the rules are constructed.

Test sequences are classified by first converting them to the wavelet representation,

and then using the relevant rules for classification purposes. The relevant rules are

determined by matching the test instance with the predicates on the left hand side of

the rules. Association rules have also been used for medical image classification in

the context of spatial data [ 20 ].

The typical approach in all of these methods is quite similar. The first step is

to mine all frequent patterns above a given support, as in standard classification

mining algorithms. Such patterns may either be mined on either the entire database

or on each class-specific database. The latter is preferred when there is a significant

imbalance between the classes in order to ensure that the patterns relevant to the

rare class are not lost in the pattern mining process. Subsequently, the confidence

of each of these frequent patterns with respect to the class variable is determined.

The patterns which have high confidence with respect to the class variable are then

determined and reported. Since the number of possible rules which satisfy the support

and confidence constraints may be very high, it is usually desirable to pick a small

subset of rules which reflect the behavior in the training data effectively. In some

methods such as in [ 122 ], the best rules for classification are mined directly, rather

than as a post-processing phase in order to ensure better efficiency. This set of rules

defines the training model for the classification process. For a given test instance, the

set of rules for which the pattern on the left hand side match with the test instance are

identified. These rules are prioritized with one or more criteria such as the confidence

and support. This priority is used to determine which class is most relevant to the test

instance by combining the votes from the different rules in a prioritized or weighted

Frequent Pattern Mining

Search WWH ::

Custom Search

Home