Uncertain Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

3

Candidate Generate-and-Test Based Uncertain Frequent

Pattern Mining

One way to mine frequent patterns from uncertain data is to apply the candidate

generate-and-test paradigm. For example, Chui et al. [ 19 ] proposed the U-Apriori

algorithm, which mines frequent patterns from uncertain data in a levelwise breadth-

first bottom-up fashion. Specifically, U-Apriori first computes the expected support

of all domain items. Those items with expected supports

minsup become ev-

ery frequent pattern consisting of 1 item (i.e., frequent 1-itemset). Afterwards, the

U-Apriori algorithm repeatedly applies the candidate generate-and-test process to

generate candidate ( k +1)-itemsets from frequent k -itemsets and test if they are

frequent ( k +1)-itemsets. Like its counterpart for mining precise data (the Apriori

algorithm [ 8 ]), U-Apriori also relies on the Apriori property (which is also known as

the anti-monotonic property or the downward closure property ) that all subsets of a

frequent pattern must also be frequent. Equivalently, all supersets of any infrequent

pattern are also infrequent.

U-Apriori improves its efficiency by incorporating the LGS-trimming strategy

(which includes local trimming, global pruning, and single-pass patch up) [ 19 ]. This

strategy trims away every item with an existential probability below the user-specified

trimming threshold (which is local to each item) from the original probabilistic

dataset D of uncertain data and then mines frequent patterns from the resulting

trimmed dataset D Trim . On the one hand, if a pattern X is frequent in D Trim , then X

must be frequent in D . On the other hand, a pattern Y is infrequent in D if

≥

expSup ( Y , D Trim )

+ e ( Y ) < minsup ,

where e ( Y ) is an upper bound of estimated error for expSup ( Y , D Trim ). Such an

infrequent pattern Y can be pruned. Moreover, a pattern Z is potentially frequent in

D if

expSup ( Z , D Trim )

≤

minsup

≤

expSup ( Z , D Trim )

+

e ( Z ) .

To patch up, U-Apriori recovers the missing frequent patterns by verifying expected

supports of potentially frequent patterns with an additional single-pass scan of D .

Although the LGS strategy improves the efficiency of U-Apriori, the algorithm still

suffers from the following problems: (i) there is an overhead in creating D Trim ,

(ii) only a subset of all the frequent patterns can be mined from D Trim and there is

overhead to patch up, (iii) the efficiency of the algorithm is sensitive to the percent-

age of items having low existential probabilities, and (iv) it is not easy to find an

appropriate value for the user-specified trimming threshold.

Chui and Kao [ 18 ] applied the decremental pruning technique to further improve

the efficiency of U-Apriori. The technique helps reduce the number of candidate

patterns by progressively estimating the upper bounds of expected support of candi-

date patterns after each transaction is processed. If the estimated upper bound of a

candidate pattern X falls below minsup , then X is immediately pruned.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home