Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

can identify the case of closed itemsets during mining. Pruning strategies include the

following:

Item merging : If every transaction containing a frequent itemset X also contains an itemset

Y but not any proper superset of Y , then X [ Y forms a frequent closed itemset and there

is no need to search for any itemset containing X but no Y .

For example, in Table 6.2 of Example 6.5, the projected conditional database for

prefix itemset fI5:2g is ffI2, I1g, fI2, I1, I3gg, from which we can see that each of its

transactions contains itemset fI2, I1g but no proper superset of fI2, I1g. Itemset fI2,

I1g can be merged with fI5g to form the closed itemset, fI5, I2, I1: 2g, and we do not

need to mine for closed itemsets that contain I5 but not fI2, I1g.

Sub-itemset pruning : If a frequent itemset X is a proper subset of an already found fre-

quent closed itemset Y and support count(X)Dsupport count(Y) , then X and all of X 's

descendants in the set enumeration tree cannot be frequent closed itemsets and thus can

be pruned.

Similar to Example 6.2, suppose a transaction database has only two trans-

actions: fh a 1 , a 2 ,

, a 50 ig, and the minimum support count is

min sup D 2. The projection on the first item, a 1 , derives the frequent itemset, f a 1 ,

a 2 ,

:::

, a 100 i, h a 1 , a 2 ,

:::

, a 50 : 2g, based on the itemset merging optimization. Because support (f a 2 g) D

support (f a 1 , a 2 ,

:::

, a 50 g, there

is no need to examine a 2 and its projected database. Similar pruning can be done

for a 3 ,

:::

, a 50 g) D 2, and f a 2 g is a proper subset of f a 1 , a 2 ,

:::

, a 50 as well. Thus, the mining of closed frequent itemsets in this data set

terminates after mining a 1 's projected database.

:::

Item skipping : In the depth-first mining of closed itemsets, at each level, there will be

a prefix itemset X associated with a header table and a projected database. If a local

frequent item p has the same support in several header tables at different levels, we can

safely prune p from the header tables at higher levels .

Consider, for example, the previous transaction database having only two trans-

actions: fh a 1 , a 2 ,

, a 50 ig, where min sup D 2. Because a 2 in a 1 's

projected database has the same support as a 2 in the global header table, a 2 can be

pruned from the global header table. Similar pruning can be done for a 3 ,

:::

, a 100 i, h a 1 , a 2 ,

:::

, a 50 .

There is no need to mine anything more after mining a 1 's projected database.

Besides pruning the search space in the closed itemset mining process, another

important optimization is to perform efficient checking of each newly derived frequent

itemset to see whether it is closed. This is because the mining process cannot ensure that

every generated frequent itemset is closed.

When a new frequent itemset is derived, it is necessary to perform two kinds of

closure checking: (1) superset checking , which checks if this new frequent itemset is a

superset of some already found closed itemsets with the same support, and (2) subset

checking , which checks whether the newly found itemset is a subset of an already found

closed itemset with the same support.

If we adopt the item merging pruning method under a divide-and-conquer frame-

work, then the superset checking is actually built-in and there is no need to explicitly

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home