Uncertain Frequent Pattern Mining - Frequent Pattern Mining - page 343

Database Reference

In-Depth Information

Fig. 14.4 The CUF-tree for

the probabilistic dataset D 2 of

uncertain data

Item

Expected support

a :2.04

a

2.3

b

2.2

b :2.04

c

1.8

d

1.4

e

1.0

c :1.62

d :0.42

e :0.72

e :0.42

d :0.54

Fig. 14.5 The FP-tree for the

traditional database D 1 of

precise data

compact as the FP-tree (for capturing precise data). See Fig. 14.4 , which shows the

CUF-tree for probabilistic dataset D 2 of uncertain data. Note that this CUF-tree is as

compact as an FP-tree (ref. Fig. 14.5 ) for the traditional database D 1 of precise data.

Like UFP-growth, the CUF-growth algorithm also takes three scans of the prob-

abilistic dataset of uncertain data to mine frequent patterns. CUF-growth first scans

the dataset to compute the transaction caps, and it then scans the dataset the second

time to build the CUF-tree. The header table associated with the CUF-tree gives the

expected support of frequent 1-itemsets (i.e., frequent singletons or frequent items).

The CUF-tree stores transaction caps, which provide upper bounds to the expected

support of frequent k -itemsets (for k

2). For any k -itemset X , if the upper bound

to its expected support is less than minsup , then X can be safely pruned.

By extracting appropriate tree paths and constructing CUF-trees for subsequent

projected databases, CUF-growth finds all potentially frequent patterns at the end of

the second scan of the probabilistic dataset of uncertain data. As these potentially

frequent patterns include all truly frequent patterns and some infrequent patterns

(i.e., some false positives ), CUF-growth then quickly scans the dataset the third time

to check each of them to verify whether or not they are truly frequent (i.e., prune

false positives).

≥

Next Page

Frequent Pattern Mining

Search WWH ::

Custom Search

Home