Database Reference
In-Depth Information
9.1.5 Pattern Growth Algorithm
The Apriori algorithm presented above is the most popular approach to
mining association rules. However, when the minimum support is low or
the length of the patterns is long, candidate generation may turn out to
be inecient. In addition, in each iteration, the database must be scanned
with respect to the current candidates, which is also a costly task. To address
these problems, another approach to mining frequent itemsets, called pattern
growth, has been devised. The pattern growth algorithm does not generate
candidate itemsets. The method uses a two-step approach: In the first step,
a compact data structure, called FP-tree , is built to encode the database.
Two scans of the database are required for this. In the second step, the
frequent itemsets are extracted after traversing and partitioning the FP-tree.
We sketch the idea of the algorithm next, using the same transaction database
as above.
TransactionId
Itemset
Ordered Frequent Items
1000
{ 1,2,3 }
1:3, 2:2, 3:2
2000
{
1,3
}
1:3, 3:2
3000
{ 1,4 }
1:3
4000
{
2,5,6
}
2:2
Note that, to better illustrate the algorithm, we added a column to
represent the frequent items, in descending order of support. For example, 1:3
means that item 1 has support 3 (recall that in this example, the minimum
support is 2). Items 4 , 5 ,and 6 were excluded from this new column because
their support is 1. Below, we explain how these values are obtained.
The FP-tree is a data structure defined as follows:
￿ The nodes are of the form i : c ,where i represents the item and c represents
acounter.
￿ There is an edge (indicated as a solid line) between nodes appearing
together in a transaction; a fixed order is used, so paths in the tree overlap
if transactions share items.
￿ There is a table T containing an entry for each item in the tree.
￿ There is an edge (indicated as a dashed line) between nodes containing
the same item (independent of the value of the counter), forming a path
composed of nodes representing the same item.
￿ There is an edge between each entry in T to the node in the tree which is
the head of the path indicated above.
Figure 9.3 shows the FP-tree of our example. We next explain how we built
this tree.
The FP-tree is constructed in two passes over the database. In the first
pass , we scan the database to find the support of each item. We discard items
 
Search WWH ::




Custom Search