Supervised Pattern Mining and Applications to Classification - Frequent Pattern Mining

Database Reference

In-Depth Information

approach places less importance on the prediction of individual rules and is related

to the ensembles idea from machine learning: if predictors' errors are uncorrelated,

using several of them should remove many non-systematic errors.

A straightforward method consists of majority voting , in which the predicted

class label is that predicted by the majority of rules. Alternatively, rules' votes can

be weighted, by their accuracy, strength, or support in a given class, for instance,

and the class with the strongest vote is predicted. Many pattern-based classifiers

use this scheme: CMAR performs weighted voting, discounting rules' vote by their

deviation from their potentially maximal χ 2 -score, whereas FITCARE simply adds

up rules relative support per class, as does ARC-BC. CAEP sums up patterns' growth

rate multiplied by their relative support in a class, and DEEP takes the proportion of

instances in a class that contain any of the voting patterns as the weight of the vote

for that class. Harmony includes three voting options: either the highest-confidence

rule, or all , or the top- k rules vote for a particular class, similar to XRULES, which

also uses different rule strength measures.

CTC has used different options: the decision list, majority vote, and two weighted

voting strategies, as has CORCLASS.

The analogy with machine learning is exploited most in the GBOOST algorithm

[ 34 ]. In GBOOST, an analogy is observed between weak learners and patterns. This

analogy is exploited by modifying the LPBOOST boosting algorithm, developed in

the machine learning literature, to iteratively search for patterns instead of weak

learners. It can be shown that under certain conditions this algorithm finds optimal

linear classification and regression models, where patterns are used as features in the

linear models. The boosting algorithm operates by iteratively modifying the weights

of examples based on the outcome of a linear program.

A particular feature of some sets of rules is that they represent decision trees.

Essentially, every path from the root of a decision tree to a leaf of a tree can be seen

as a rule that predicts the label of that leaf. All the rules cover disjoint parts of the

data. It is hence not surprising that patterns can also be used to represent paths in

decision trees. This observation was exploited in the DL8 approach by [ 30 ], which

showed that by post-processing a set of patterns found under constraints, a decision

tree can be constructed that is optimal under certain conditions. The approach differs

from Tree 2

(see below) in that each pattern represents a path in the tree, while in

Tree 2

each pattern represents a node.

4.2

Indirect Classification

Indirect classification comes in several flavors. First, there are the techniques that

partition the data, sort unseen instances into a certain block, and use the majority label

of the block's instances in the training data to make the prediction, like decision trees.

The Tree 2 and M b T build this kind of classifier. Other machine learning formalisms

can also be adopted to work with supervised patterns—the LB algorithm uses a Naïve

Bayes-like formulation to derive predictions from the support of patterns in different

Frequent Pattern Mining

Search WWH ::

Custom Search

Home