Supervised Pattern Mining and Applications to Classification - Frequent Pattern Mining

Database Reference

In-Depth Information

by the Picker ∗ algorithm, is sub-modular , allowing to give a bound on the quality of

greedy approximation to the optimal solution.

There are other “global-global” techniques that differ in that they do not manip-

ulate the data explicitly: the technique introduced by [ 11 ] post-processes patterns

by rewarding them for class correlation on the full data and penalizing overlap on

data already covered by selected patterns. The Krimp technique, described by [ 36 ],

also falls into this category since it evaluates for each pattern how much it adds to

the overall, i.e. global, compression of the data, post-processing a fixed order on

patterns.

Instead of removing examples, a reasonable alternative is to attach a weight to

examples and modify the weights based on the current composition of a rule set, as

in the following generic approach:

1. Find the best rule on the current weighted data

2. Modify the weights of the examples in the data

3. Return to 1.

A reason to give a lower weight to an example may for instance be that we already

have many rules that predict this example correctly, and we would like to focus on

finding rules for examples that are predicted incorrectly.

This setting performs global evaluation as each new pattern is evaluated on the

complete dataset and in principle the weights of all examples can be modified.

Examples of approaches within this setting were proposed by [ 13 , 33 , 34 , 44 ];

they can be used either in iterative mining or in post-processing. In the first work,

transaction weights are adjusted directly in a subgroup discovery setting. Since sub-

group discovery is more concerned with mining good descriptions of statistically

different subgroups than with accurate prediction, the removal of covered instances

is undesirable. The other works, comprising the GPLS and GBOOST algorithms, and a

Bayesian linear regression technique, derive the transaction weights indirectly from

weights for patterns involved in a linear classification or regression function and

mine patterns iteratively. Since pin point prediction of a numerical value is difficult,

reweighting instances based on the current performance of the function is superior

to removing instances.

The upshot of these techniques is that the increased computational complexity

pays off in a pattern set of smaller cardinality than for “local-local” approaches,

typically of comparable or even better quality.

3.3

Local Evaluation, Global Modification

Given the faster running times yet larger pattern sets of “local-local” approaches,

and the more expensive operation yet smaller, high-quality sets of “global-global”

techniques, the development of “local-global” algorithms should be obvious:

1. Find a best pattern on a subset of the data

Search WWH ::

Custom Search

Home