Interesting Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

such as iterative scaling or a gradient descent [ 16 ]. The computational bottleneck

in these methods is checking the constraints, namely computing the mean ES, a

procedure that may take O (

= O (2 K ) time. As such, solving the MaxEnt problem

in general for a given set of itemset constraints is computationally infeasible [ 63 ].

However, let us recall that in this section we are computing the distribution only to

rank itemsets. Hence, we can limit ourselves by considering constraints defined only

on items in X , effectively ignoring any item outside X [ 64 , 46 , 53 ]. This effectively

brings down the computational complexity down to a much more accessible O (2 | X | ).

| |

)

4.3.2

MaxEnt and Derivability

Once inferred, we can compare the expectation to the observed supports using the

same techniques as we developed above for the independence model. Moreover,

there exists an interesting connection between the MaxEnt model and derivability

of the support of an itemset. More specifically, for a given itemset X , the MaxEnt

model derived using proper subsets of X shares a connection with the concept of non-

derivable itemsets. An itemset is derivable if and only if its frequency can be deduced

from the frequencies of its subsets. This is only possible when any distribution p

∈

Q

produces the same expectation E p [ S X ] as the observed support. This immediately

implies that the expected support according to p emp is exactly the same as observed

support. In summary, if an itemset is derivable, then a MaxEnt model derived from

its subsets will produce the same expectation as the observed support.

4.3.3

MaxEnt Models for Whole Databases

So far we have considered only models on individual transactions. Alternatively, we

can consider models on whole datasets, that is, instead of assigning probabilities to

individual transactions, we assign probability to whole datasets. The space on which

distribution is defined is now

K , where K is the number of items and N

is the number of transactions, that is, contains all possible binary datasets of size

N

×

={

0, 1

}

K . Note that under this model N is fixed along with K , where as in transaction-

based model only K is fixed and we consider dataset to be N i.i.d. samples. That is,

while above we considered the data to be a bag of i.i.d. samples, we here assume

the whole dataset to be one single sample. As such, different from the setting above,

here which rows support an itemset are also considered to be of interest—and hence,

the background knowledge should not just contain patterns, but also their row-sets.

In other words, we can use tiles as constraints. Given a tile T , let us write S T ( D )

for the number of 1s in entries of D corresponding to T . The mean E [ S T ] is then the

expected number of1sin T . Note that we can easily model column margins using

tiles, simply by creating K tiles, i th tile containing i th column and every row. We

can similarly create row margins. The maximum entropy model derived from both

rows and margins is known as Rasch model [ 57 ].

×

Frequent Pattern Mining

Search WWH ::

Custom Search

Home