Database Reference
In-Depth Information
Table 14.10 An example of a
probabilistic dataset D 4 with
mutually exclusive items
Transaction ID Set of items with existential probability
t 1 { j 1 :0.5, j 2 :0.5}
t 2 { k 1 :0.1, k 2 :0.2, k 3 :0.3, k 4 :0.4}
t 3 { j 1 :0.3, j 2 :0.3, j 3 :0.1}
Observation 1: The sum of existential probability of all items in each transaction is bounded above
by 1.
Observation 2: There are 60 “possible worlds” for D 4 (cf. 512 “possible worlds” if items in each
transaction were independent (i.e., not mutually exclusive)).
in W j ), which represents the probability of x being frequent exceeding the user
expectation.
Equivalently, given (i) a probabilistic dataset D of uncertain data, (ii) a user-
specified support threshold minsup , (iii) a user-specified frequentness probability
threshold minProb , the research problem of mining probabilistic heavy hitters
(PHHs) from uncertain data is to find every item x that is highly likely to be
frequent—i.e., the probability that x occurs in at least minsup transactions in D is no
less than minProb . In other words, x is a probabilistic heavy hitter if P ( sup ( x , D )
minsup ) > minProb .
To find these PHHs from a probabilistic dataset of uncertain data where items in
each transaction are mutually inclusive (e.g., D 4 in Table 14.10 ), Zhang et al. [ 56 ]
proposed an exact algorithm and an approximate algorithm. The exact algorithm uses
dynamic programming to mine offline uncertain data for PHHs. Such an algorithm
runs in polynomial time when there is sufficient memory. When the memory is
limited, the approximate algorithm uses sampling techniques to mine streaming
uncertain data for approximate PHHs.
11.2
Mining Probabilistic Frequent Patterns
The expected support of a pattern X (that consists of one or more items) provides users
with an estimate of the frequency of X , but it does not take into account the variance
or the probability distribution of the support of X . In some applications, knowing
the confidence on which pattern is highly likely to be frequent helps interpreting
patterns mined from uncertain data. Hence, Bernecker et al. [ 15 ] extended the notion
of frequent patterns and introduced the research problem of mining probabilistic
frequent patterns (p-FPs). Figure 14.11 illustrates the differences between expected
support and probabilistic support.
Definition 14.5 Given (i) a probabilistic dataset D of uncertain data, (ii) a user-
specified support threshold minsup , (iii) a user-specified frequentness probability
threshold minProb , the research problem of mining probabilistic frequent patterns
(p-FPs) from uncertain data is to find (i) all patterns that are highly likely to
be frequent and (ii) their support. Here, the support sup ( X , D ) of any pattern X
is defined by a discrete probability distribution function (pdf) or probability mass
 
Search WWH ::




Custom Search