Uncertain Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

Table 14.5 Prefixed item caps for the probabilistic dataset D 2 of uncertain data

Transaction ID

Set of items w/existential probability (& prefixed item cap)

t 1

{ a :0.2 (0.2), b :0.9 (0.18), c :0.4 (0.36)}

t 2

{ a :0.6 (0.6), b :0.6 (0.36), c :0.6 (0.36), d :0.9 (0.54)}

t 3

{ a :0.6 (0.6), b :0.5 (0.30), d :0.5 (0.30), e :0.7 (0.42)}

t 4

{ a :0.9 (0.9), b :0.2 (0.18), c :0.8 (0.72), e :0.3 (0.27)}

5.4

PUF-growth

Along this direction, Leung and Tanbeer [ 40 ] observed that (i) the transaction cap pro-

vides CUF-growth with an upper bound to expected support of patterns and (ii) such

an upper bound can be tightened in a tree-based environment. They introduced the

concept of a prefixed item cap , which can be defined as follows.

Definition 14.3 The prefixed item cap —denoted by I Cap ( x r , t i )—of an item x r

in a transaction t i

represent

the length of t i ), is defined as the product of P ( x r , t i ) and the highest existential

probability value M of items from x 1 to x r − 1 in t i (i.e., in the proper prefix of x r in

t i ). More formally,

x 1 , ... , x r , ... , x h }

, where 1

≤

h (i.e., h =

t i |

P ( x r , t i )

t i |

> 1

PIcap ( x r , t i )

(14.5)

t i |=

1 (i.e., t i =

P ( x 1 , t i )

{ x 1 })

where M

max q ∈ [1, r − 1] P ( x q , t i ).

Assume that items are arranged in the order

from the root to leaves.

Then, Table 14.5 shows the prefixed item cap for every item in a transaction in

a probabilistic dataset D 2 of uncertain data. See Fig. 14.6 for how these prefixed

item caps are captured in a new tree structure called PUF-tree , from which the

corresponding algorithm called PUF-growth mines uncertain frequent patterns.

Like UFP-growth and CUF-growth, the PUF-growth algorithm also takes three

scans of the probabilistic dataset of uncertain data to mine frequent patterns. With the

first scan, PUF-growth computes the prefixed item caps. With the second scan, PUF-

growth builds a PUF-tree to capture (i) an item and (ii) its corresponding prefixed

item cap. Like those in CUF-tree, paths in the PUF-tree are shared if the nodes on

these paths share the same item. Hence, the resulting PUF-tree is of the same size

as the CUF-tree (also for capturing uncertain data), which can be as compact as the

FP-tree (for capturing precise data). The header table associated with the PUF-tree

gives the expected support of frequent 1-itemsets (i.e., singleton patterns or frequent

items). The prefixed item caps in the PUF-tree provide upper bounds to the expected

support of k -itemsets (for k

a , b , c , d , e

2). For any k -itemset X , if the upper bound to its

expected support is less than minsup , then X can be safely pruned.

By extracting appropriate tree paths and constructing PUF-trees for subsequent

projected databases, PUF-growth finds all potentially frequent patterns at the end of

the second scan of the probabilistic dataset of uncertain data. As these potentially

frequent patterns include all truly frequent patterns and some infrequent patterns

≥

Frequent Pattern Mining

Search WWH ::

Custom Search

Home