Database Reference
In-Depth Information
such as iterative scaling or a gradient descent [ 16 ]. The computational bottleneck
in these methods is checking the constraints, namely computing the mean ES, a
procedure that may take O (
= O (2 K ) time. As such, solving the MaxEnt problem
in general for a given set of itemset constraints is computationally infeasible [ 63 ].
However, let us recall that in this section we are computing the distribution only to
rank itemsets. Hence, we can limit ourselves by considering constraints defined only
on items in X , effectively ignoring any item outside X [ 64 , 46 , 53 ]. This effectively
brings down the computational complexity down to a much more accessible O (2 | X | ).
| |
)
4.3.2
MaxEnt and Derivability
Once inferred, we can compare the expectation to the observed supports using the
same techniques as we developed above for the independence model. Moreover,
there exists an interesting connection between the MaxEnt model and derivability
of the support of an itemset. More specifically, for a given itemset X , the MaxEnt
model derived using proper subsets of X shares a connection with the concept of non-
derivable itemsets. An itemset is derivable if and only if its frequency can be deduced
from the frequencies of its subsets. This is only possible when any distribution p
Q
produces the same expectation E p [ S X ] as the observed support. This immediately
implies that the expected support according to p emp is exactly the same as observed
support. In summary, if an itemset is derivable, then a MaxEnt model derived from
its subsets will produce the same expectation as the observed support.
4.3.3
MaxEnt Models for Whole Databases
So far we have considered only models on individual transactions. Alternatively, we
can consider models on whole datasets, that is, instead of assigning probabilities to
individual transactions, we assign probability to whole datasets. The space on which
distribution is defined is now
K , where K is the number of items and N
is the number of transactions, that is, contains all possible binary datasets of size
N
N
×
={
0, 1
}
K . Note that under this model N is fixed along with K , where as in transaction-
based model only K is fixed and we consider dataset to be N i.i.d. samples. That is,
while above we considered the data to be a bag of i.i.d. samples, we here assume
the whole dataset to be one single sample. As such, different from the setting above,
here which rows support an itemset are also considered to be of interest—and hence,
the background knowledge should not just contain patterns, but also their row-sets.
In other words, we can use tiles as constraints. Given a tile T , let us write S T ( D )
for the number of 1s in entries of D corresponding to T . The mean E [ S T ] is then the
expected number of1sin T . Note that we can easily model column margins using
tiles, simply by creating K tiles, i th tile containing i th column and every row. We
can similarly create row margins. The maximum entropy model derived from both
rows and margins is known as Rasch model [ 57 ].
×
Search WWH ::




Custom Search