Database Reference
In-Depth Information
2. Based on all patterns, manipulate the entire data
3. Recur on the new blocks
Notwithstanding this statement, the REMINE algorithm proposed by [ 45 ]sofaris
the only one to proceed in this way to iteratively mine supervised patterns.
3.4
Data Instance-Based Selection
In addition to the partition-based techniques, there is another paradigm, which selects
patterns based on individual instances. The Harmony algorithm retains for each
training instance the highest-confidence rule, as does CCCS, whereas the technique
described by [ 28 ], called Large Bayes (LB), selects patterns based on the instances
whose labels are to be predicted. This is similar to DEEP, described by [ 26 ], and
LAC, proposed by [ 37 ], which only generate patterns that match the instances to be
predicted by projecting the data on the items contained in the unlabeled instance.
4
Classifier Construction
After supervised patterns have been mined, and suitable subsets have been selected,
the remaining question is how to employ them for predictive purposes. The solutions
that have been found fall into two main categories: (1) direct use of patterns as rules to
predict the label of an unseen class—the techniques following this paradigm borrow
heavily from rule learning approaches in machine learning, or (2) indirect use of
patterns in a model; here patterns are typically treated as features that are used in
well-established machine learning methods.
4.1
Direct Classification
There are two main methods in rule learning when it comes to making predictions.
In decision lists, rules are ordered according to some criterion and the first rule
that matches the unseen instance makes the prediction. For such classifiers to work
requires rules with high accuracy that at the same time do not overfit the training data.
This means that certain approaches to optimizing quality measures will work better
than others: given that maximizing information gain or χ 2 trades off correlation
with effect size, maximizing confidence or WRACC will be more suitable for such
classifiers. CBA follows this first approach, ordering the rule list by confidence
(descending), support (descending) and length (ascending), as does LAC, ordering
by information gain (descending).
The second method consists of various voting mechanisms that collect all rules
that match the unseen instance and has each class “gather votes” from them. This
Search WWH ::




Custom Search