Database Reference
In-Depth Information
7.3
Online Association Rule Mining
In many applications, a user may wish to query the transaction data to find the asso-
ciation rules or the frequent patterns. In such cases, even at high support levels, it is
often impossible to create the frequent patterns in online time because of the multiple
passes required over a potentially large database. One of the earliest algorithms for
online association rule mining was proposed in [ 6 ]. In this approach, an augmented
lexicographic tree is stored either on disk or in main-memory. The lexicographic tree
is augmented with all the edges represented the subset relationships between item-
sets, and is also referred to as the itemset lattice . For any given query, the itemset
lattice may be traversed to determine the association rules. It has been shown in [ 6 ],
that such an approach can also be used to determine the non-redundant association
rules in the underlying data. A second method [ 40 ] uses a condensed frequent pattern
tree (instead of a lattice) to pre-process and store the itemsets. This structure can be
queried to provide online responses.
A very different approach for online association rule mining has been proposed
in [ 34 ], in which the transaction database is processed in real time. In this case, an
incremental approach is used to mine the transaction database. This is a Continuous
Association Rule Mining Algorithm , which is referred to as CARMA . In this case,
transactions are processed as they arrive, and candidate itemsets are generated on
the fly, by examining the subsets of that transaction. Clearly, the downside is that
such an approach is that it will create a lot more candidates than any of the offline
algorithms which use levelwise methods to generate the candidates. This general
characteristic is of course true of any algorithm which tries to reduce the number of
passes with approximate candidate generation. One interesting characteristic of the
CARMA algorithm is that it allows the user to change the minimum support level
during execution. In that case, the algorithm is guaranteed to have generated the
supersets of the true itemsets in the data. If desired, a second pass over the data can
be used to remove the spurious frequent itemsets.
Many streaming methods have also been proposed that use only one pass over the
transaction data [ 19 - 21 , 35 , 43 ]. It should be pointed out that it is often difficult to find
even 1-itemsets exactly over a data stream because of the one-pass constraint [ 21 ],
when the number of distinct items is larger than the main memory availability. This
is often true of k -itemsets as well, especially at low support levels. Furthermore,
if the patterns in the stream change over time, then the frequent k -itemsets will
change significantly as well. These methods therefore have the challenge of finding
the frequent itemsets efficiently, maintaining them, and handling issues involving
evolution of the data stream. Given the numerous challenges of pattern mining in
this scenario, most of these methods find the frequent items approximately. These
issues will be discussed in detail in Chap. 9 on streaming pattern mining algorithms.
Search WWH ::




Custom Search