Supervised Pattern Mining and Applications to Classification - Frequent Pattern Mining

Database Reference

In-Depth Information

1. Mine or evaluate a (set of) pattern(s), possibly only on parts of the data

2. Based on result of 1, modify the partition, for instance by removing one block or

several blocks, or by partitioning them further

3. Return to 1, unless a stopping criterion is met

The differences lie mainly in the blocks on which patterns are evaluated, and in the

choice of blocks that are modified.

3.1

Local Evaluation, Local Modification

The first, and largest, class of techniques evaluates or mines patterns locally , i.e.

only on some of the blocks of a partition, and then also modifies only some of those

blocks, typically only those blocks from which the patterns have been mined. This

includes in particular those techniques that draw more or less directly on machine

learning forebears.

Separate-and-Conquer Sequential “local-local” techniques owe much to the se-

quential covering paradigm of early rule learners. They start from the full database

and iteratively remove examples from the dataset, as follows:

1. Find the best rule on the currently remaining data

2. Remove all data covered by that rule

3. Return to 1.

This approach falls squarely into the “local-local” category. Each pattern splits the

data that it has been mined on into two blocks (the local modification) and its successor

pattern is only mined on one of these, the uncovered one (the local evaluation).

Several early algorithms have used this approach for post-processing, for instance

CBA, ARC-BC, and CMAR, whose authors refer to it as database coverage .

Separate-and-conquer can be applied both in the post-processing setting and in

the iterative mining setting.

Post-processing can be done in two ways: (1) considering the complete set of

previously mined patterns in each iteration of the sequential covering algorithm, or

(2) fixing the order in which patterns are considered and only search for the best

rule among those rules that have not been considered in the order yet. The latter

means that (a) each pattern is only considered once—if it is rejected, it will never

be evaluated again, and (b) the decision which patterns are “best” given certain data

is effectively made before pattern set mining. In return, however, the complexity of

the learning algorithm is lower.

The algorithms mentioned above (CBA, ARC-BC, CMAR) proceed by fixed order.

CMAR differs from the other algorithms in removing data instances only after they

have been covered by several rules, guided by a user-supplied parameter. CORCLASS

also uses sequential covering with a fixed order as post-processing. Another variation

was proposed by [ 1 ] in the context of string classification; here, the rules are processed

Search WWH ::

Custom Search

Home