Interesting Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

4.2

Beyond Independence

While the independence model has many positive aspects, such as ease of compu-

tation, intuitive results, as well as interpretability, it is also fair to say it is overly

simplistic: it is naive to assume all item occurrences are independent. In practice, we

may want to take known interaction into account as background knowledge.

4.2.1

Partition Models

With the goal of mining interesting associations, Webb [ 73 ] discusses 6 principles

for identifying itemsets that are unlikely to be interesting, and to this end proposes

to check whether the frequency of an itemset X can either be closely determined by

assuming independence between any of its partitions, or by the frequency of any of

the supersets of X .

The so-called partition model which is needed to perform these tests is a natural

generalization from the independence model. More specifically, if we are given an

itemset X , consider a partition

P 1 , ... , P M of X , with i = 1 P i

P =

=

X , and

P i ∩

j . Under this model, we expect the support of an itemset to be

equal to the product of the frequencies of its parts, i.e., i = 1 fr ( P i ). It is easy to see

that for the maximal partition, when the partition contains only blocks of size 1, the

model becomes equal to the independence model.

We can now compare the expected values and the observations in the same way we

compared when were dealing with the independence model. If the partition contains

only 2 blocks, M

P j

=∅

for i

=

2, we can use Fisher's exact test [ 19 ]. While not monotonic,

Hamalainen [ 25 ] recently gave a practical bound that allows to prune large parts of

the search space.

To use the partition model we need to choose a partition. To do so, we can either

construct a global model, i.e., choose a fixed partition of

=

, or we can construct a

local model in which the actual partition depends on the itemset X . As an example of

the latter case we can consider find the partition of size 2 that best fits the observed

frequency [ 72 ].

I

4.2.2

Bayesian Networks

Another natural extension of the independence model are Bayesian networks , where

dependencies between items are expressed by a directed acyclic graph. In general,

computing an expected support from a global Bayesian network is NP-hard problem,

however it is possible to use the network structure to your advantage [ 15 ]. Additional

speed-ups are possible if we rank itemsets in one batch which allows us to use share

some computations [ 34 ].

Clearly, the partition model is mostly a practical choice with regard to com-

putability and allowing the Fisher test; it does not allow us to incorporate much more

knowledge than the independence model. Bayesian networks are very powerful, on

Frequent Pattern Mining

Search WWH ::

Custom Search

Home