Database Reference
In-Depth Information
1. The ones having itemset a as a prefix w.r.t.
R
, i.e., those containing item a ; and
2. The ones having itemset f as a prefix w.r.t.
R
, i.e., those containing item f but
no a .
They form two projected databases [ 18 ] which can be mined with the constraint C
pushed in. We examine the first one only since the second is similar.
Since a is a frequent itemset satisfying the constraint, the frequent itemsets having
a as a proper prefix can be found in a-projected database (the subset of transactions
containing a ). It contains two transactions: bcdf and cdef . Since items b and e
is infrequent within this projected database, neither ab nor ae can be frequent. So,
they are pruned. The frequent items in the a -projected database is f , d , c , listed in
the order
. Since ac does not satisfy the constraint, there is no need to create an
ac -projected database.
To check what can be mined in the a -projected database with af and ad , as prefix,
respectively, we need to construct the two projected databases and mine them. This
process is similar to the mining of a -projected databases.
The af -projected database contains two frequent items d and c , and only af d
satisfy the constraint. Moreover, since af d c does not satisfies the constraint, the
process in this branch is complete. Since af c violates the constraint, there is no
need to construct af c -projected database. The ad-projected database contains one
frequent item c ,but adc does not satisfy the constraint. Therefore, the set of frequent
itemsets satisfying the constraint and having a as prefix contains a , af , af d , and
ad .
In summary, the complete set of frequent itemsets satisfying the constraint con-
tains 6 itemsets: a , f , af , ad , af d , fg . The method with ordered itemsets and
frequent pattern growth generates and tests only a small set of itemsets.
This example shows that by proper ordering of itemsets, frequent pattern growth
method may push some tough constraints (called convertible constraints ) deeper
than the Apriori methods. A systematic classification of such constraints and a study
of how to push them into the mining process are in [ 27 , 29 ].
R
4
PrefixSpan: Mining Sequential Patterns by Pattern Growth
Sequential pattern mining, which discovers frequent subsequences as patterns in
a sequence database, is an important data mining problem with broad applications,
including the analyses of customer purchase behavior, Web access patterns, scientific
experiments, disease treatments, natural disasters, DNA sequences, and so on.
A sequence database S is a set of tuples
, where sid is a sequence_id and
s is a sequence (i.e., an ordered list of itemsets). A tuple
sid , s
sid , s
is said to contain a
sequence α ,if α is a subsequence of s , i.e., α
s . The support of a sequence α in
a sequence database S is the number of tuples in the database containing α .Given
a positive integer ξ as the support threshold , a sequence α is called a sequential
pattern in sequence database S if the sequence is contained by at least ξ tuples in
Search WWH ::




Custom Search