Mining and Using Sets of Patterns through Compression - Frequent Pattern Mining

Database Reference

In-Depth Information

What makes likely a good addition to the model? A pattern that helps to reduce

redundancy in the encoding. In our setting, this means correlations between code

usages. If the code for pattern A and the code for pattern B often co-occur, we can gain

bits using a new code C meaning ' A and B '. We can hence find good candidates by

mining frequent patterns in 'encoding space'. Moreover, by employing an optimistic

estimate we can prune large parts of the search space, and efficiently identify the

best pattern [ 59 ]. In general terms, we have

1. Start with an 'empty' model M .

2. Find that F

∈ X

that minimizes L (

D

, M

∪

F ).

3. Add F to M .

4. Repeat steps 2-3 until compression can no longer be improved.

Because of the strong dependence on the specific encoding and pattern type, pro-

viding a universal algorithmic strategy for step 2 is hardly possible—in itemset data

correlations mean co-occurrences [ 59 ], in sequential data it means close-by occur-

rences [ 64 ], etc. In general, the current encoding of the data will have to be inspected

to see if there are any 'patterns' in there that can be exploited to improve compression.

The SLIM algorithm [ 59 ] was the first to implement this strategy for MDL, and

induces KRIMP code tables by iteratively searching for pairs of itemsets that often

occur together. The union of the pair that results in the best improvement in com-

pression is added to the code table. Although it hence considers a search space of

only O (

2 ), its results very closely approximate the ideal

local greedy strategy, or, KRAMP. In particular for dense data, SLIM can be orders

of magnitude faster than KRIMP, obtaining smaller code tables that offer much more

succinct descriptions of the data.

To save computation, SQS does not iteratively identify the best candidate, but

instead iteratively generates a set of candidates given the current model, considers

all these candidates in turn, then generates new candidates, etc, until MDL tells it to

stop.

|

CT

|

2 ) instead of O (

| F |

5

MDL for Data Mining

So far, we considered compression for model selection, but it has been argued [ 15 ]

and shown in the literature that it can also be used for many (data mining) tasks. For

example, we already referred to the Normalized Compression Distance [ 41 ]. Another

concrete example is the usage of MPEG video compression for image clustering [ 29 ].

In these examples, existing compression algorithms are used as 'black boxes' to

approximate Kolmogorov complexity, and usually only dataset-level compression is

required (to be precise, individual strings/objects are considered as 'datasets').

In this chapter, we are particularly interested in compression-based models that

allow for inspection, so that any discovered local structure can be interpreted by

domain experts. For that purpose pattern-based models that can be selected by means

of the MDL principle have been developed. However, we have not yet discussed if

Frequent Pattern Mining

Search WWH ::

Custom Search

Home