Uncertain Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 14.10 An example of a

probabilistic dataset D 4 with

mutually exclusive items

Transaction ID Set of items with existential probability

t 1 { j 1 :0.5, j 2 :0.5}

t 2 { k 1 :0.1, k 2 :0.2, k 3 :0.3, k 4 :0.4}

t 3 { j 1 :0.3, j 2 :0.3, j 3 :0.1}

Observation 1: The sum of existential probability of all items in each transaction is bounded above

by 1.

Observation 2: There are 60 “possible worlds” for D 4 (cf. 512 “possible worlds” if items in each

transaction were independent (i.e., not mutually exclusive)).

in W j ), which represents the probability of x being frequent exceeding the user

expectation.

Equivalently, given (i) a probabilistic dataset D of uncertain data, (ii) a user-

specified support threshold minsup , (iii) a user-specified frequentness probability

threshold minProb , the research problem of mining probabilistic heavy hitters

(PHHs) from uncertain data is to find every item x that is highly likely to be

frequent—i.e., the probability that x occurs in at least minsup transactions in D is no

less than minProb . In other words, x is a probabilistic heavy hitter if P ( sup ( x , D )

≥

minsup ) > minProb .

To find these PHHs from a probabilistic dataset of uncertain data where items in

each transaction are mutually inclusive (e.g., D 4 in Table 14.10 ), Zhang et al. [ 56 ]

proposed an exact algorithm and an approximate algorithm. The exact algorithm uses

dynamic programming to mine offline uncertain data for PHHs. Such an algorithm

runs in polynomial time when there is sufficient memory. When the memory is

limited, the approximate algorithm uses sampling techniques to mine streaming

uncertain data for approximate PHHs.

11.2

Mining Probabilistic Frequent Patterns

The expected support of a pattern X (that consists of one or more items) provides users

with an estimate of the frequency of X , but it does not take into account the variance

or the probability distribution of the support of X . In some applications, knowing

the confidence on which pattern is highly likely to be frequent helps interpreting

patterns mined from uncertain data. Hence, Bernecker et al. [ 15 ] extended the notion

of frequent patterns and introduced the research problem of mining probabilistic

frequent patterns (p-FPs). Figure 14.11 illustrates the differences between expected

support and probabilistic support.

Definition 14.5 Given (i) a probabilistic dataset D of uncertain data, (ii) a user-

specified support threshold minsup , (iii) a user-specified frequentness probability

threshold minProb , the research problem of mining probabilistic frequent patterns

(p-FPs) from uncertain data is to find (i) all patterns that are highly likely to

be frequent and (ii) their support. Here, the support sup ( X , D ) of any pattern X

is defined by a discrete probability distribution function (pdf) or probability mass

Frequent Pattern Mining

Search WWH ::

Custom Search

Home