Database Reference
In-Depth Information
Fig. 2.6 Enumeration tree
exploration
at that node. Therefore, the database is also projected in terms of attributes, in which
only items which are candidate extensions at a node are retained. The candidate set
F ( P ) of item extensions of node P is a very small subset of the universe of items
at lower levels of the enumeration tree. In fact, even the items in the node P need
not be retained explicitly in the transaction, because they are known to always be
present in all the selected transactions based on the first condition. This projection
process is performed recursively in top-down fashion down the enumeration tree
for counting purposes, where lower level nodes inherit the projections from higher
level nodes and add one additional item to the projection at each level. The idea of
this inheritance-based approach is that the projected database remembers the count-
ing work done at higher levels of the enumeration tree by (successively) removing
irrelevant transactions and irrelevant items at each level of the projection. Such an
approach works efficiently because it never repeats the counting work which has
already been done at the higher levels. Thus, the primary savings in the strategy arise
from avoiding repetitive and wasteful counting.
A bare-bones depth-first version of TreeProjection , that is similar to DepthProject ,
but without maximal pruning, is described in Fig. 2.6 . A more detailed descrip-
tion with maximal pruning and other optimizations is provided later in this chapter.
Because the algorithm is described recursively, the current prefix P (node of the
lexicographic tree) being extended is one of the arguments to the algorithm. In the
initial call, the value of P is null because one intends to determine all frequent de-
scendants at the root of the lexicographic tree. This algorithm recursively extends
frequent prefixes and maintains only the transaction database relevant to the prefix.
The frequent prefixes are extended by determining the items i that are frequent in
T
is reported. The extension of the frequent prefix can
be viewed as a recursive call at a node of the enumeration tree. Thus, at a given
enumeration tree node, one now has a completely independent problem of extending
the prefix with the projected database that is relevant to all descendants of that node.
The conditional database
. Then the itemset P
∪{
i
}
T i refers to the subset of the original transaction database
T
corresponding to transactions containing item i . Furthermore, the item i and any
item occurring lexicographically earlier to it is not retained in the database because
Search WWH ::




Custom Search