Pattern-Growth Methods - Frequent Pattern Mining

Database Reference

In-Depth Information

1. The ones having itemset a as a prefix w.r.t.

R

, i.e., those containing item a ; and

2. The ones having itemset f as a prefix w.r.t.

R

, i.e., those containing item f but

no a .

They form two projected databases [ 18 ] which can be mined with the constraint C

pushed in. We examine the first one only since the second is similar.

Since a is a frequent itemset satisfying the constraint, the frequent itemsets having

a as a proper prefix can be found in a-projected database (the subset of transactions

containing a ). It contains two transactions: bcdf and cdef . Since items b and e

is infrequent within this projected database, neither ab nor ae can be frequent. So,

they are pruned. The frequent items in the a -projected database is f , d , c , listed in

the order

. Since ac does not satisfy the constraint, there is no need to create an

ac -projected database.

To check what can be mined in the a -projected database with af and ad , as prefix,

respectively, we need to construct the two projected databases and mine them. This

process is similar to the mining of a -projected databases.

The af -projected database contains two frequent items d and c , and only af d

satisfy the constraint. Moreover, since af d c does not satisfies the constraint, the

process in this branch is complete. Since af c violates the constraint, there is no

need to construct af c -projected database. The ad-projected database contains one

frequent item c ,but adc does not satisfy the constraint. Therefore, the set of frequent

itemsets satisfying the constraint and having a as prefix contains a , af , af d , and

ad .

In summary, the complete set of frequent itemsets satisfying the constraint con-

tains 6 itemsets: a , f , af , ad , af d , fg . The method with ordered itemsets and

frequent pattern growth generates and tests only a small set of itemsets.

This example shows that by proper ordering of itemsets, frequent pattern growth

method may push some tough constraints (called convertible constraints ) deeper

than the Apriori methods. A systematic classification of such constraints and a study

of how to push them into the mining process are in [ 27 , 29 ].

R

4

PrefixSpan: Mining Sequential Patterns by Pattern Growth

Sequential pattern mining, which discovers frequent subsequences as patterns in

a sequence database, is an important data mining problem with broad applications,

including the analyses of customer purchase behavior, Web access patterns, scientific

experiments, disease treatments, natural disasters, DNA sequences, and so on.

A sequence database S is a set of tuples

, where sid is a sequence_id and

s is a sequence (i.e., an ordered list of itemsets). A tuple

sid , s

is said to contain a

sequence α ,if α is a subsequence of s , i.e., α

s . The support of a sequence α in

a sequence database S is the number of tuples in the database containing α .Given

a positive integer ξ as the support threshold , a sequence α is called a sequential

pattern in sequence database S if the sequence is contained by at least ξ tuples in

Frequent Pattern Mining

Search WWH ::

Custom Search

Home