Data Analytics: Exploiting the Data Warehouse - Data Warehouse Systems: Design and Implementation - page 338

Database Reference

In-Depth Information

9.1.5 Pattern Growth Algorithm

The Apriori algorithm presented above is the most popular approach to

mining association rules. However, when the minimum support is low or

the length of the patterns is long, candidate generation may turn out to

be inecient. In addition, in each iteration, the database must be scanned

with respect to the current candidates, which is also a costly task. To address

these problems, another approach to mining frequent itemsets, called pattern

growth, has been devised. The pattern growth algorithm does not generate

candidate itemsets. The method uses a two-step approach: In the first step,

a compact data structure, called FP-tree , is built to encode the database.

Two scans of the database are required for this. In the second step, the

frequent itemsets are extracted after traversing and partitioning the FP-tree.

We sketch the idea of the algorithm next, using the same transaction database

as above.

TransactionId

Itemset

Ordered Frequent Items

1000

{ 1,2,3 }

1:3, 2:2, 3:2

2000

{

1,3

}

1:3, 3:2

3000

{ 1,4 }

1:3

4000

{

2,5,6

}

2:2

Note that, to better illustrate the algorithm, we added a column to

represent the frequent items, in descending order of support. For example, 1:3

means that item 1 has support 3 (recall that in this example, the minimum

support is 2). Items 4 , 5 ,and 6 were excluded from this new column because

their support is 1. Below, we explain how these values are obtained.

The FP-tree is a data structure defined as follows:

The nodes are of the form i : c ,where i represents the item and c represents

acounter.

There is an edge (indicated as a solid line) between nodes appearing

together in a transaction; a fixed order is used, so paths in the tree overlap

if transactions share items.

There is a table T containing an entry for each item in the tree.

There is an edge (indicated as a dashed line) between nodes containing

the same item (independent of the value of the counter), forming a path

composed of nodes representing the same item.

There is an edge between each entry in T to the node in the tree which is

the head of the path indicated above.

Figure 9.3 shows the FP-tree of our example. We next explain how we built

this tree.

The FP-tree is constructed in two passes over the database. In the first

pass , we scan the database to find the support of each item. We discard items

Next Page

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home