Database Reference
In-Depth Information
in order of confidence, but only those instances are removed which are classified
correctly by the rule under consideration.
The ART algorithm [ 17 ] learns several best car s, splits their respective coverage
off, and re-iterates on the uncovered data. DDPMine by [ 12 ] is perhaps the algorithm
that stays truest to the original sequential covering idea: it mines a highest-scoring
pattern, removes all covered instances from the data, and recurs.
Divide-and-Conquer Techniques The second type of “local-local” techniques
takes its cues from decision tree induction :
1. Find the best splitting criterion on a subset of the data
2. Split the data into two blocks corresponding to covered and uncovered instances
3. Recur on the new blocks
A potential advantage of this type of technique is that all mistakes by one pattern can
be corrected by other patterns, since all data are reused in later instances to derive
additional patterns. In addition, patterns that might not appear interesting on the
whole data might become relevant as soon as parts of the data are removed.
This technique is most commonly used in an iterative mining setting, in which the
best pattern is searched for using a branch-and-bound top-1 pattern mining algorithm.
Examples are Tree 2 , proposed by [ 7 ], and M b T[ 15 ].
A post-processing approach can also be used. For instance, [ 18 ] developed a
setting in which δ -free patterns are first mined, and then combined for use as tests in
a decision tree.
3.2
Global Evaluation, Global Modification
Alternatively, patterns can be mined or evaluated on the entire data set, and all blocks
in the partition are modified. While this means that mining (or selecting) patterns is
done using the maximal amount of information, this usually has to be paid for by
increased computational complexity, as in each iteration the complete data needs to
be traversed. Additionally, the semantics of patterns' relationships are less easy to
understand than in the case of “local-local” approaches.
Such techniques necessarily proceed sequentially, either post-processing or min-
ing patterns one after another. The Picker algorithm by [ 8 ] performs post-processing
in this manner, picking the pattern that creates the most balanced partition, and
splitting all blocks accordingly. It proceeds according to the first option for post-
processing described above, considering all promising patterns. The FCORK [ 35 ]
technique uses a measure based on correspondences :
Definition 3.2
Given an equivalence relation
P
on a labeled data set
D C
=
D + D , the number of correspondences in this partition is calculated as occ (
P
=
)
[ d ] D C / P |
D + |·|
D |
.
and uses this measure both to post-process mined patterns, and to iteratively mine
patterns that reduce correspondences the most. This criterion, as well as that used
[ d ]
[ d ]
Search WWH ::




Custom Search