Database Reference
In-Depth Information
Unlike with transaction-based MaxEnt model and itemsets as constraints, discov-
ering the MaxEnt for a set tiles can be done in polynomial time. The reason for this
is that tile constraints allow us to factorize the model into a product of individual
cells, which in turns allows us to compute the expectations E [ S T ] efficiently.
We can use the Rasch model to rank tiles based on the likelihood, that is, the
probability that a random dataset will obtain the same values in T as the original
dataset. We can show that this is equal to
p emp ( R ij
=
D ij ),
( i , j )
T
where D ij is the ( i , j )th entry of the input dataset and R is the variable representing
( i , j )th entry in a dataset. The smaller the probability, the more surprising is the
tile according to the Rasch model. Unfortunately, this measure is monotonically
decreasing. Consequently, the most interesting tile will contain the whole dataset. In
order to remedy this problem, Kontonasios and De Bie [ 37 ] propose an normalization
approach inspired by the Minimum Description Length principle [ 58 ], dividing the
log-likelihood of the tile by its description length, roughly equal to the size of the
transaction set plus the size of the itemset.
4.4
Randomization Approaches
So far, we compute expected frequencies by explicitly inferring the underlying
distribution. However, we can avoid this by sampling datasets.
More formally, let be the set of all possible binary datasets of size N
×
K ,
K . Many of these datasets are not realistic, for example contains a
dataset full of 1 s. Hence, we restrict our attention to datasets that have the same
characteristics as the input dataset. One particular simple set of statistics is a set
containing row and column margins. Assume that we have computed the number of
1s in each row and column. Let us write c i for the number of1sin i th column, and
r j , the number of1sin j th row. Now consider, to be a subset containing only
the datasets that have the column margins corresponding to
N
×
={
0, 1
}
{
c i }
and row margins
. Consider a uniform distribution over . We can now use this
distribution to compute the expected value of a pattern. Such a distribution is closely
related to the Rasch models explained previously. However, there are some technical
differences. Datasets sampled from are forced to have certain row and column
margins exactly while with Rasch models row and column margins are only forced
on average. This, however, comes at a cost.
The uniform distribution over is very complex and, unlike the Rasch model,
cannot be used directly. To remedy this problem we will have to sample datasets.
Sampling from scratch is very difficult, hence we will use a MCMC approach
[ 23 , 29 ]. Given a dataset D from , we first sample two columns, i and j , and two
rows x and y . If it happens that out of four values D ix , D jx , D iy , and D jy , two are
corresponding to
{
r j }
Search WWH ::




Custom Search