Database Reference
In-Depth Information
Table 14.5 Prefixed item caps for the probabilistic dataset D 2 of uncertain data
Transaction ID
Set of items w/existential probability (& prefixed item cap)
t 1
{ a :0.2 (0.2), b :0.9 (0.18), c :0.4 (0.36)}
t 2
{ a :0.6 (0.6), b :0.6 (0.36), c :0.6 (0.36), d :0.9 (0.54)}
t 3
{ a :0.6 (0.6), b :0.5 (0.30), d :0.5 (0.30), e :0.7 (0.42)}
t 4
{ a :0.9 (0.9), b :0.2 (0.18), c :0.8 (0.72), e :0.3 (0.27)}
5.4
PUF-growth
Along this direction, Leung and Tanbeer [ 40 ] observed that (i) the transaction cap pro-
vides CUF-growth with an upper bound to expected support of patterns and (ii) such
an upper bound can be tightened in a tree-based environment. They introduced the
concept of a prefixed item cap , which can be defined as follows.
Definition 14.3 The prefixed item cap —denoted by I Cap ( x r , t i )—of an item x r
in a transaction t i
represent
the length of t i ), is defined as the product of P ( x r , t i ) and the highest existential
probability value M of items from x 1 to x r 1 in t i (i.e., in the proper prefix of x r in
t i ). More formally,
={
x 1 , ... , x r , ... , x h }
, where 1
r
h (i.e., h =
|
t i |
P ( x r , t i )
×
M
if
|
t i |
> 1
PIcap ( x r , t i )
=
(14.5)
|
t i |=
1 (i.e., t i =
P ( x 1 , t i )
if
{ x 1 })
where M
max q [1, r 1] P ( x q , t i ).
Assume that items are arranged in the order
=
from the root to leaves.
Then, Table 14.5 shows the prefixed item cap for every item in a transaction in
a probabilistic dataset D 2 of uncertain data. See Fig. 14.6 for how these prefixed
item caps are captured in a new tree structure called PUF-tree , from which the
corresponding algorithm called PUF-growth mines uncertain frequent patterns.
Like UFP-growth and CUF-growth, the PUF-growth algorithm also takes three
scans of the probabilistic dataset of uncertain data to mine frequent patterns. With the
first scan, PUF-growth computes the prefixed item caps. With the second scan, PUF-
growth builds a PUF-tree to capture (i) an item and (ii) its corresponding prefixed
item cap. Like those in CUF-tree, paths in the PUF-tree are shared if the nodes on
these paths share the same item. Hence, the resulting PUF-tree is of the same size
as the CUF-tree (also for capturing uncertain data), which can be as compact as the
FP-tree (for capturing precise data). The header table associated with the PUF-tree
gives the expected support of frequent 1-itemsets (i.e., singleton patterns or frequent
items). The prefixed item caps in the PUF-tree provide upper bounds to the expected
support of k -itemsets (for k
a , b , c , d , e
2). For any k -itemset X , if the upper bound to its
expected support is less than minsup , then X can be safely pruned.
By extracting appropriate tree paths and constructing PUF-trees for subsequent
projected databases, PUF-growth finds all potentially frequent patterns at the end of
the second scan of the probabilistic dataset of uncertain data. As these potentially
frequent patterns include all truly frequent patterns and some infrequent patterns
 
Search WWH ::




Custom Search