Database Reference
In-Depth Information
3
Candidate Generate-and-Test Based Uncertain Frequent
Pattern Mining
One way to mine frequent patterns from uncertain data is to apply the candidate
generate-and-test paradigm. For example, Chui et al. [
19
] proposed the
U-Apriori
algorithm, which mines frequent patterns from uncertain data in a levelwise breadth-
first bottom-up fashion. Specifically, U-Apriori first computes the expected support
of all domain items. Those items with expected supports
minsup
become ev-
ery frequent pattern consisting of 1 item (i.e., frequent 1-itemset). Afterwards, the
U-Apriori algorithm repeatedly applies the candidate generate-and-test process to
generate candidate (
k
+1)-itemsets from frequent
k
-itemsets and test if they are
frequent (
k
+1)-itemsets. Like its counterpart for mining precise data (the Apriori
algorithm [
8
]), U-Apriori also relies on the
Apriori property
(which is also known as
the
anti-monotonic property
or the
downward closure property
) that all subsets of a
frequent pattern must also be frequent. Equivalently, all supersets of any infrequent
pattern are also infrequent.
U-Apriori improves its efficiency by incorporating the
LGS-trimming strategy
(which includes local trimming, global pruning, and single-pass patch up) [
19
]. This
strategy trims away every item with an existential probability below the user-specified
trimming threshold (which is local to each item) from the original probabilistic
dataset
D
of uncertain data and then mines frequent patterns from the resulting
trimmed dataset
D
Trim
. On the one hand, if a pattern
X
is frequent in
D
Trim
, then
X
must be frequent in
D
. On the other hand, a pattern
Y
is infrequent in
D
if
≥
expSup
(
Y
,
D
Trim
)
+
e
(
Y
)
< minsup
,
where
e
(
Y
) is an upper bound of estimated error for
expSup
(
Y
,
D
Trim
). Such an
infrequent pattern
Y
can be pruned. Moreover, a pattern
Z
is potentially frequent in
D
if
expSup
(
Z
,
D
Trim
)
≤
minsup
≤
expSup
(
Z
,
D
Trim
)
+
e
(
Z
)
.
To patch up, U-Apriori recovers the missing frequent patterns by verifying expected
supports of potentially frequent patterns with an additional single-pass scan of
D
.
Although the LGS strategy improves the efficiency of U-Apriori, the algorithm still
suffers from the following problems: (i) there is an overhead in creating
D
Trim
,
(ii) only a subset of all the frequent patterns can be mined from
D
Trim
and there is
overhead to patch up, (iii) the efficiency of the algorithm is sensitive to the percent-
age of items having low existential probabilities, and (iv) it is not easy to find an
appropriate value for the user-specified trimming threshold.
Chui and Kao [
18
] applied the
decremental pruning technique
to further improve
the efficiency of U-Apriori. The technique helps reduce the number of candidate
patterns by progressively estimating the upper bounds of expected support of candi-
date patterns after each transaction is processed. If the estimated upper bound of a
candidate pattern
X
falls below
minsup
, then
X
is immediately pruned.