Databases Reference
In-Depth Information
can identify the case of closed itemsets during mining. Pruning strategies include the
following:
Item merging : If every transaction containing a frequent itemset X also contains an itemset
Y but not any proper superset of Y , then X [ Y forms a frequent closed itemset and there
is no need to search for any itemset containing X but no Y .
For example, in Table 6.2 of Example 6.5, the projected conditional database for
prefix itemset fI5:2g is ffI2, I1g, fI2, I1, I3gg, from which we can see that each of its
transactions contains itemset fI2, I1g but no proper superset of fI2, I1g. Itemset fI2,
I1g can be merged with fI5g to form the closed itemset, fI5, I2, I1: 2g, and we do not
need to mine for closed itemsets that contain I5 but not fI2, I1g.
Sub-itemset pruning : If a frequent itemset X is a proper subset of an already found fre-
quent closed itemset Y and support count(X)Dsupport count(Y) , then X and all of X 's
descendants in the set enumeration tree cannot be frequent closed itemsets and thus can
be pruned.
Similar to Example 6.2, suppose a transaction database has only two trans-
actions: fh a 1 , a 2 ,
, a 50 ig, and the minimum support count is
min sup D 2. The projection on the first item, a 1 , derives the frequent itemset, f a 1 ,
a 2 ,
:::
, a 100 i, h a 1 , a 2 ,
:::
, a 50 : 2g, based on the itemset merging optimization. Because support (f a 2 g) D
support (f a 1 , a 2 ,
:::
, a 50 g, there
is no need to examine a 2 and its projected database. Similar pruning can be done
for a 3 ,
:::
, a 50 g) D 2, and f a 2 g is a proper subset of f a 1 , a 2 ,
:::
, a 50 as well. Thus, the mining of closed frequent itemsets in this data set
terminates after mining a 1 's projected database.
:::
Item skipping : In the depth-first mining of closed itemsets, at each level, there will be
a prefix itemset X associated with a header table and a projected database. If a local
frequent item p has the same support in several header tables at different levels, we can
safely prune p from the header tables at higher levels .
Consider, for example, the previous transaction database having only two trans-
actions: fh a 1 , a 2 ,
, a 50 ig, where min sup D 2. Because a 2 in a 1 's
projected database has the same support as a 2 in the global header table, a 2 can be
pruned from the global header table. Similar pruning can be done for a 3 ,
:::
, a 100 i, h a 1 , a 2 ,
:::
:::
, a 50 .
There is no need to mine anything more after mining a 1 's projected database.
Besides pruning the search space in the closed itemset mining process, another
important optimization is to perform efficient checking of each newly derived frequent
itemset to see whether it is closed. This is because the mining process cannot ensure that
every generated frequent itemset is closed.
When a new frequent itemset is derived, it is necessary to perform two kinds of
closure checking: (1) superset checking , which checks if this new frequent itemset is a
superset of some already found closed itemsets with the same support, and (2) subset
checking , which checks whether the newly found itemset is a subset of an already found
closed itemset with the same support.
If we adopt the item merging pruning method under a divide-and-conquer frame-
work, then the superset checking is actually built-in and there is no need to explicitly
 
Search WWH ::




Custom Search