Database Reference
In-Depth Information
the other hand, but also notoriously hard to infer from data. More importantly, they
can be very hard to read. Unless we use very simple networks, it is possible our
model can make inferences that are far from the intuition of the analyst, and hence
prune itemsets that are potentially interesting. Next we discuss a class of models that
can circumvent these problems.
4.3
Maximum Entropy Models
In general, for any knowledge that we may have about the data, there are potentially
infinitely many possible distributions that we can choose to test against: any distribu-
tion that satisfies our background knowledge goes. Clearly, however, not all of these
are an equally good choice. For example, say that all we know about a certain row
in the data is that it contains 10 ones out of a possible 100 items. Then, while not
incorrect, a distribution that puts all probability mass on exactly one configuration
(e.g., the first 10 items), and assigns probability 0 to all other configurations, does
make a choice that intuitively seems unwarranted given what we know. This raises
the question, how should we choose the distribution to test against?
The answer was given by Jaynes [ 35 ] who formulated the Maximum Entropy prin-
ciple. Loosely speaking, the MaxEnt principle states that given some background
knowledge, the best distribution is the one that (1) matches the background knowl-
edge, and (2) is otherwise as random as possible. It is exactly this distribution that
makes optimal use of the provided background knowledge, while making no further
assumptions.
4.3.1
MaxEnt Models for Transactions
As an example, let us discuss the MaxEnt model for binary data, in which we can
incorporate frequencies of itemsets as background knowledge. Formally, let K be
the number of items, and let be the space of all possible transactions, that is,
K is a set of binary vectors of length K . In order to compute the expected
support of an itemset, we need to infer a distribution, say p , over .
Our next step is to put some constraints on what type of distributions we consider.
More formally, we assume that we are given a set of functions S 1 , ... , S M , S i : R
={
0, 1
}
,
accompanied with desired values θ i . Now let us consider a specific set of distributions,
namely
Q
={
p
|
E p [ S i ]
=
θ i , i
=
1, ... , M
}
,
where E p [ S i ] is the expected value of S i w.r.t. p ,
E p [ S i ]
=
p ( ω ) S i ( ω )
.
ω
In other words, Q consists of distributions for which the average value of S i is equal
to θ i .
Search WWH ::




Custom Search