Database Reference
In-Depth Information
4.2
Beyond Independence
While the independence model has many positive aspects, such as ease of compu-
tation, intuitive results, as well as interpretability, it is also fair to say it is overly
simplistic: it is naive to assume all item occurrences are independent. In practice, we
may want to take known interaction into account as background knowledge.
4.2.1
Partition Models
With the goal of mining interesting associations, Webb [ 73 ] discusses 6 principles
for identifying itemsets that are unlikely to be interesting, and to this end proposes
to check whether the frequency of an itemset X can either be closely determined by
assuming independence between any of its partitions, or by the frequency of any of
the supersets of X .
The so-called partition model which is needed to perform these tests is a natural
generalization from the independence model. More specifically, if we are given an
itemset X , consider a partition
P 1 , ... , P M of X , with i = 1 P i
P =
=
X , and
P i
j . Under this model, we expect the support of an itemset to be
equal to the product of the frequencies of its parts, i.e., i = 1 fr ( P i ). It is easy to see
that for the maximal partition, when the partition contains only blocks of size 1, the
model becomes equal to the independence model.
We can now compare the expected values and the observations in the same way we
compared when were dealing with the independence model. If the partition contains
only 2 blocks, M
P j
=∅
for i
=
2, we can use Fisher's exact test [ 19 ]. While not monotonic,
Hamalainen [ 25 ] recently gave a practical bound that allows to prune large parts of
the search space.
To use the partition model we need to choose a partition. To do so, we can either
construct a global model, i.e., choose a fixed partition of
=
, or we can construct a
local model in which the actual partition depends on the itemset X . As an example of
the latter case we can consider find the partition of size 2 that best fits the observed
frequency [ 72 ].
I
4.2.2
Bayesian Networks
Another natural extension of the independence model are Bayesian networks , where
dependencies between items are expressed by a directed acyclic graph. In general,
computing an expected support from a global Bayesian network is NP-hard problem,
however it is possible to use the network structure to your advantage [ 15 ]. Additional
speed-ups are possible if we rank itemsets in one batch which allows us to use share
some computations [ 34 ].
Clearly, the partition model is mostly a practical choice with regard to com-
putability and allowing the Fisher test; it does not allow us to incorporate much more
knowledge than the independence model. Bayesian networks are very powerful, on
Search WWH ::




Custom Search