Database Reference
In-Depth Information
We now describe the complete hybrid algorithm. The algorithm is referred to as
STREAMMINING and is illustrated in Algorithm 8 The algorithm has two inter-
leaved phases. The first phase deals with 2-itemsets, and the second phase deals
with k -itemsets, k> 2. The main procedure uses three subroutines, UPDATE, RE-
DUCFREQ, and TWOITEMSETPERTRANSACTION, shown separately in Algorithm 9.
The first phase extends the BOUNDEDCOUNT algorithm to deal with 2-itemsets. As
we stated previously, the algorithm maintains a buffer
which stores the recently
received transactions. Initially, the buffer is empty. When a new transaction t arrives,
we put it in
T
T
. Next, we call the UPDATE routine to increment counts in
L 2 . This rou-
tine simply updates the count of 2-itemsets that are already in
L 2 . If a new 2-itemset
appears, it is inserted into
L 2 .
When the size of
f , where f is a weighted average
number of 2-itemsets per transaction, we call the procedure REDUCFREQ to reduce
the count of each 2-itemsets in
L 2 is beyond the threshold
1
L 2 , and the itemsets whose count becomes zero are
deleted. Invoking REDUCFREQ on
L 2 triggers the second phase.
The second phase of the algorithm deals with all k -itemsets, k> 2. This process
is carried out level-wise, i.e, it proceeds from 3-itemsets to the largest potential
frequent itemsets. For each transaction in the buffer
T
, we enumerate all i -subsets.
Search WWH ::




Custom Search