Database Reference
In-Depth Information
approach places less importance on the prediction of individual rules and is related
to the ensembles idea from machine learning: if predictors' errors are uncorrelated,
using several of them should remove many non-systematic errors.
A straightforward method consists of majority voting , in which the predicted
class label is that predicted by the majority of rules. Alternatively, rules' votes can
be weighted, by their accuracy, strength, or support in a given class, for instance,
and the class with the strongest vote is predicted. Many pattern-based classifiers
use this scheme: CMAR performs weighted voting, discounting rules' vote by their
deviation from their potentially maximal χ 2 -score, whereas FITCARE simply adds
up rules relative support per class, as does ARC-BC. CAEP sums up patterns' growth
rate multiplied by their relative support in a class, and DEEP takes the proportion of
instances in a class that contain any of the voting patterns as the weight of the vote
for that class. Harmony includes three voting options: either the highest-confidence
rule, or all , or the top- k rules vote for a particular class, similar to XRULES, which
also uses different rule strength measures.
CTC has used different options: the decision list, majority vote, and two weighted
voting strategies, as has CORCLASS.
The analogy with machine learning is exploited most in the GBOOST algorithm
[ 34 ]. In GBOOST, an analogy is observed between weak learners and patterns. This
analogy is exploited by modifying the LPBOOST boosting algorithm, developed in
the machine learning literature, to iteratively search for patterns instead of weak
learners. It can be shown that under certain conditions this algorithm finds optimal
linear classification and regression models, where patterns are used as features in the
linear models. The boosting algorithm operates by iteratively modifying the weights
of examples based on the outcome of a linear program.
A particular feature of some sets of rules is that they represent decision trees.
Essentially, every path from the root of a decision tree to a leaf of a tree can be seen
as a rule that predicts the label of that leaf. All the rules cover disjoint parts of the
data. It is hence not surprising that patterns can also be used to represent paths in
decision trees. This observation was exploited in the DL8 approach by [ 30 ], which
showed that by post-processing a set of patterns found under constraints, a decision
tree can be constructed that is optimal under certain conditions. The approach differs
from Tree 2
(see below) in that each pattern represents a path in the tree, while in
Tree 2
each pattern represents a node.
4.2
Indirect Classification
Indirect classification comes in several flavors. First, there are the techniques that
partition the data, sort unseen instances into a certain block, and use the majority label
of the block's instances in the training data to make the prediction, like decision trees.
The Tree 2 and M b T build this kind of classifier. Other machine learning formalisms
can also be adopted to work with supervised patterns—the LB algorithm uses a Naïve
Bayes-like formulation to derive predictions from the support of patterns in different
Search WWH ::




Custom Search