Database Reference
In-Depth Information
Fig. 14.4 The CUF-tree for
the probabilistic dataset D 2 of
uncertain data
Item
Expected support
a :2.04
a
2.3
b
2.2
b :2.04
c
1.8
d
1.4
e
1.0
c :1.62
d :0.42
e :0.72
e :0.42
d :0.54
Fig. 14.5 The FP-tree for the
traditional database D 1 of
precise data
compact as the FP-tree (for capturing precise data). See Fig. 14.4 , which shows the
CUF-tree for probabilistic dataset D 2 of uncertain data. Note that this CUF-tree is as
compact as an FP-tree (ref. Fig. 14.5 ) for the traditional database D 1 of precise data.
Like UFP-growth, the CUF-growth algorithm also takes three scans of the prob-
abilistic dataset of uncertain data to mine frequent patterns. CUF-growth first scans
the dataset to compute the transaction caps, and it then scans the dataset the second
time to build the CUF-tree. The header table associated with the CUF-tree gives the
expected support of frequent 1-itemsets (i.e., frequent singletons or frequent items).
The CUF-tree stores transaction caps, which provide upper bounds to the expected
support of frequent k -itemsets (for k
2). For any k -itemset X , if the upper bound
to its expected support is less than minsup , then X can be safely pruned.
By extracting appropriate tree paths and constructing CUF-trees for subsequent
projected databases, CUF-growth finds all potentially frequent patterns at the end of
the second scan of the probabilistic dataset of uncertain data. As these potentially
frequent patterns include all truly frequent patterns and some infrequent patterns
(i.e., some false positives ), CUF-growth then quickly scans the dataset the third time
to check each of them to verify whether or not they are truly frequent (i.e., prune
false positives).
Search WWH ::




Custom Search