Database Reference
In-Depth Information
We now describe the complete hybrid algorithm. The algorithm is referred to as
STREAMMINING and is illustrated in Algorithm 8 The algorithm has two inter-
leaved phases. The first phase deals with 2-itemsets, and the second phase deals
with
k
-itemsets,
k>
2. The main procedure uses three subroutines, UPDATE, RE-
DUCFREQ, and TWOITEMSETPERTRANSACTION, shown separately in Algorithm 9.
The first phase extends the BOUNDEDCOUNT algorithm to deal with 2-itemsets. As
we stated previously, the algorithm maintains a buffer
which stores the recently
received transactions. Initially, the buffer is empty. When a new transaction
t
arrives,
we put it in
T
T
. Next, we call the UPDATE routine to increment counts in
L
2
. This rou-
tine simply updates the count of 2-itemsets that are already in
L
2
. If a new 2-itemset
appears, it is inserted into
L
2
.
When the size of
f
, where
f
is a weighted average
number of 2-itemsets per transaction, we call the procedure REDUCFREQ to reduce
the count of each 2-itemsets in
L
2
is beyond the threshold
1
/θ
L
2
, and the itemsets whose count becomes zero are
deleted. Invoking REDUCFREQ on
L
2
triggers the
second
phase.
The second phase of the algorithm deals with all
k
-itemsets,
k>
2. This process
is carried out level-wise, i.e, it proceeds from 3-itemsets to the largest potential
frequent itemsets. For each transaction in the buffer
T
, we enumerate all
i
-subsets.