Database Reference
In-Depth Information
Fig. 2.15 Aggregating bucket
counts
transaction. The fact that the number of distinct transactions in the projected database
is small can be exploited to yield substantially more efficient counting algorithms.
The aim is to count the support for the entire subtree rooted at P with a quick pass
through the data, and an additional postprocessing phase which is independent of
database size. The process of performing bucket counting consists of two phases:
1. In the first phase, the counts of each distinct transaction present in the projected
database are determined. This can be accomplished easily by maintaining 2 | E ( P ) |
buckets or counters, scanning the transactions one by one, and adding counts to
the buckets. The time for performing this set of operations is linear in the number
of (projected) database transactions.
2. In the second phase, the counts of the 2 | E ( P ) | transaction are used to determine
the aggregate support counts for each itemset. In general, the support count of an
itemset may be obtained by adding the counts of all the supersets of that itemset
to it. A skillful algorithm (from the efficiency perspective) for performing these
operations is illustrated in Fig. 2.15 .
Consider a string composed of 0, 1, and
that refers to an itemset in which the
positions with 0 and 1 are fixed to those values (corresponding to presence or absence
of items), while a position with a
is a “don't care”. Thus, all itemsets can be
expressed in terms of 1 and
because itemsets are traditionally defined with respect
to presence of items. Consider for example, the case when
| E ( P )
|=
4, and there are
four items, numbered
{
1, 2, 3, 4
}
. An itemset containing items 2 and 4 is denoted by
1. We start off with the information on 2 4
16 bitstrings which are composed of
0 and 1. These represent all possible distinct transactions. The algorithm aggregates
the counts in
1
=
iterations. The count for a string with a “*” in a particular
position may be obtained by adding the counts for the strings with a 0 and 1 in those
positions. For example, the count for the string *1*1 may be expressed as the sum
of the counts of the strings 01*1 and 11*1.
|
E ( P )
|
Search WWH ::




Custom Search