Database Reference
In-Depth Information
3
Candidate Generate-and-Test Based Uncertain Frequent
Pattern Mining
One way to mine frequent patterns from uncertain data is to apply the candidate
generate-and-test paradigm. For example, Chui et al. [ 19 ] proposed the U-Apriori
algorithm, which mines frequent patterns from uncertain data in a levelwise breadth-
first bottom-up fashion. Specifically, U-Apriori first computes the expected support
of all domain items. Those items with expected supports
minsup become ev-
ery frequent pattern consisting of 1 item (i.e., frequent 1-itemset). Afterwards, the
U-Apriori algorithm repeatedly applies the candidate generate-and-test process to
generate candidate ( k +1)-itemsets from frequent k -itemsets and test if they are
frequent ( k +1)-itemsets. Like its counterpart for mining precise data (the Apriori
algorithm [ 8 ]), U-Apriori also relies on the Apriori property (which is also known as
the anti-monotonic property or the downward closure property ) that all subsets of a
frequent pattern must also be frequent. Equivalently, all supersets of any infrequent
pattern are also infrequent.
U-Apriori improves its efficiency by incorporating the LGS-trimming strategy
(which includes local trimming, global pruning, and single-pass patch up) [ 19 ]. This
strategy trims away every item with an existential probability below the user-specified
trimming threshold (which is local to each item) from the original probabilistic
dataset D of uncertain data and then mines frequent patterns from the resulting
trimmed dataset D Trim . On the one hand, if a pattern X is frequent in D Trim , then X
must be frequent in D . On the other hand, a pattern Y is infrequent in D if
expSup ( Y , D Trim )
+ e ( Y ) < minsup ,
where e ( Y ) is an upper bound of estimated error for expSup ( Y , D Trim ). Such an
infrequent pattern Y can be pruned. Moreover, a pattern Z is potentially frequent in
D if
expSup ( Z , D Trim )
minsup
expSup ( Z , D Trim )
+
e ( Z ) .
To patch up, U-Apriori recovers the missing frequent patterns by verifying expected
supports of potentially frequent patterns with an additional single-pass scan of D .
Although the LGS strategy improves the efficiency of U-Apriori, the algorithm still
suffers from the following problems: (i) there is an overhead in creating D Trim ,
(ii) only a subset of all the frequent patterns can be mined from D Trim and there is
overhead to patch up, (iii) the efficiency of the algorithm is sensitive to the percent-
age of items having low existential probabilities, and (iv) it is not easy to find an
appropriate value for the user-specified trimming threshold.
Chui and Kao [ 18 ] applied the decremental pruning technique to further improve
the efficiency of U-Apriori. The technique helps reduce the number of candidate
patterns by progressively estimating the upper bounds of expected support of candi-
date patterns after each transaction is processed. If the estimated upper bound of a
candidate pattern X falls below minsup , then X is immediately pruned.
Search WWH ::




Custom Search