Database Reference
In-Depth Information
take place during this step. The same process is repeated two more times with the
second and third order bits. At the end of three passes, each bucket contains the
support count for the appropriate itemset, where the '0' for the itemset is replaced
by a “don't care” which is represented by a '*'. Note that the number of transactions
in this example is 27. This is represented by the entry for the bucket ***. Only two
transactions contain all three items that is represented by the bucket 111.
The projection-based methods were shown to have an order of magnitude im-
provement over the MaxMiner algorithm. The depth-first approach has subsequently
been used in the context of many tree-based algorithms. Other examples of such
algorithms include those in [ 17 , 18 , 14 ]. Among these, the MAFIA algorithm [ 14 ]
is discussed in some detail in the next subsection. An approach which varies on
the projection methodology, and uses opportunistic projection is discussed in [ 38 ].
This algorithm opportunistically chooses between array-based and tree-based rep-
resentations to represent projected transaction subsets. Such an approach has been
shown to be more efficient than many state of the art methods such as the FP-Growth
method. Other variations of tree-based algorithms have also been proposed [ 70 ] that
use different strategies in tree exploration.
5.2.3
MAFIA Algorithm
The MAFIA algorithm proposed in [ 14 ] shares a number of similarities to the Depth-
Project approach, though it uses a bitmap based approach for counting, rather than
the use of a projected transaction database. In the bitmap-based approach, a sequence
of bits is maintained for each itemset that corresponds to whether or not that transac-
tion contains that particular item. Sparse representations (such as a list of transaction
identifiers) may also be used, when the fraction of transactions containing the itemset
is small. Note that such an approach may be considered a special case of database
projection [ 5 ], in which vertical projection is used but horizontal projection is not.
This has the advantage of requiring less memory, but it reuses a smaller fraction of
the counting information from higher level nodes. A number of other pruning opti-
mizations have also been proposed in this work that further improve the effectiveness
of the algorithm. In particular, it has been pointed out that when the support of the
extension of a node is the same as that of its parent, then that subtree can be pruned
away, because of the counts of all the itemsets in the subtree can be derived from those
of other itemsets in the data. This is the same as the support lower bounding trick
discussed in Sect. 2.4 , and also used in MaxMiner for pruning. Thus, the approach
in [ 14 ] uses many of the same strategies used in MaxMiner and TreeProjection ,but
with in a different combination, and with some variations on specific implementation
details.
Search WWH ::




Custom Search