Database Reference
In-Depth Information
1. Mine or evaluate a (set of) pattern(s), possibly only on parts of the data
2. Based on result of 1, modify the partition, for instance by removing one block or
several blocks, or by partitioning them further
3. Return to 1, unless a stopping criterion is met
The differences lie mainly in the blocks on which patterns are evaluated, and in the
choice of blocks that are modified.
3.1
Local Evaluation, Local Modification
The first, and largest, class of techniques evaluates or mines patterns locally , i.e.
only on some of the blocks of a partition, and then also modifies only some of those
blocks, typically only those blocks from which the patterns have been mined. This
includes in particular those techniques that draw more or less directly on machine
learning forebears.
Separate-and-Conquer Sequential “local-local” techniques owe much to the se-
quential covering paradigm of early rule learners. They start from the full database
and iteratively remove examples from the dataset, as follows:
1. Find the best rule on the currently remaining data
2. Remove all data covered by that rule
3. Return to 1.
This approach falls squarely into the “local-local” category. Each pattern splits the
data that it has been mined on into two blocks (the local modification) and its successor
pattern is only mined on one of these, the uncovered one (the local evaluation).
Several early algorithms have used this approach for post-processing, for instance
CBA, ARC-BC, and CMAR, whose authors refer to it as database coverage .
Separate-and-conquer can be applied both in the post-processing setting and in
the iterative mining setting.
Post-processing can be done in two ways: (1) considering the complete set of
previously mined patterns in each iteration of the sequential covering algorithm, or
(2) fixing the order in which patterns are considered and only search for the best
rule among those rules that have not been considered in the order yet. The latter
means that (a) each pattern is only considered once—if it is rejected, it will never
be evaluated again, and (b) the decision which patterns are “best” given certain data
is effectively made before pattern set mining. In return, however, the complexity of
the learning algorithm is lower.
The algorithms mentioned above (CBA, ARC-BC, CMAR) proceed by fixed order.
CMAR differs from the other algorithms in removing data instances only after they
have been covered by several rules, guided by a user-supplied parameter. CORCLASS
also uses sequential covering with a fixed order as post-processing. Another variation
was proposed by [ 1 ] in the context of string classification; here, the rules are processed
Search WWH ::




Custom Search