Interesting Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

Unlike with transaction-based MaxEnt model and itemsets as constraints, discov-

ering the MaxEnt for a set tiles can be done in polynomial time. The reason for this

is that tile constraints allow us to factorize the model into a product of individual

cells, which in turns allows us to compute the expectations E [ S T ] efficiently.

We can use the Rasch model to rank tiles based on the likelihood, that is, the

probability that a random dataset will obtain the same values in T as the original

dataset. We can show that this is equal to

p emp ( R ij

=

D ij ),

( i , j )

∈

T

where D ij is the ( i , j )th entry of the input dataset and R is the variable representing

( i , j )th entry in a dataset. The smaller the probability, the more surprising is the

tile according to the Rasch model. Unfortunately, this measure is monotonically

decreasing. Consequently, the most interesting tile will contain the whole dataset. In

order to remedy this problem, Kontonasios and De Bie [ 37 ] propose an normalization

approach inspired by the Minimum Description Length principle [ 58 ], dividing the

log-likelihood of the tile by its description length, roughly equal to the size of the

transaction set plus the size of the itemset.

4.4

Randomization Approaches

So far, we compute expected frequencies by explicitly inferring the underlying

distribution. However, we can avoid this by sampling datasets.

More formally, let be the set of all possible binary datasets of size N

×

K ,

K . Many of these datasets are not realistic, for example contains a

dataset full of 1 s. Hence, we restrict our attention to datasets that have the same

characteristics as the input dataset. One particular simple set of statistics is a set

containing row and column margins. Assume that we have computed the number of

1s in each row and column. Let us write c i for the number of1sin i th column, and

r j , the number of1sin j th row. Now consider, to be a subset containing only

the datasets that have the column margins corresponding to

N

×

={

0, 1

}

{

c i }

and row margins

. Consider a uniform distribution over . We can now use this

distribution to compute the expected value of a pattern. Such a distribution is closely

related to the Rasch models explained previously. However, there are some technical

differences. Datasets sampled from are forced to have certain row and column

margins exactly while with Rasch models row and column margins are only forced

on average. This, however, comes at a cost.

The uniform distribution over is very complex and, unlike the Rasch model,

cannot be used directly. To remedy this problem we will have to sample datasets.

Sampling from scratch is very difficult, hence we will use a MCMC approach

[ 23 , 29 ]. Given a dataset D from , we first sample two columns, i and j , and two

rows x and y . If it happens that out of four values D ix , D jx , D iy , and D jy , two are

corresponding to

{

r j }

Frequent Pattern Mining

Search WWH ::

Custom Search

Home