Database Reference
In-Depth Information
For any i -subset that is already in
, the process will be the same as for a 2-itemset,
i.e, we will simply increment the count. However, an i -subset that is not in
L
L
will be
inserted in
L
only if all of its i
1 subsets are in
L
as well. Thus, we use the Apriori
property.
After updating i -itemsets in
, we again invoke the REDUCFREQ routine. Thus,
the itemsets whose count is only 1 will be deleted from the lattice. This procedure
will continue until there are no frequent k -itemsets in
L
. At the end of this, we clear
the buffer, and start processing new transactions in the stream. This will restart the
first phase of our algorithm to deal with 2-itemsets.
In their experimental results, Jin and Agrawal show that STREAMMINING is more
memory efficient than LOSSY COUNTING. On the T10.I4.N10K dataset used in Manku
and Motwani's paper, with 1 million transactions and a support level of 1 %, LOSSY
COUNTING requires an out-of-core data structure on top ofaa44MBbuffer.Onthe
other hand, for larger datasets ranging from 4 million to 20 million transactions, Jin
and Agrawal's algorithm only requires 2.5 MB in main memory.
L
Tilted Time Window Model Giannella et al. [ 28 ] employ a tilted time window
model to record a compressed history of the entire data stream's frequent patterns.
Its unique windowing structure allows it to approximately answer queries about
frequent itemsets for a time window from any t 1 to t 2 . The window size of the
response is within 50 % of the size requested by the user.
The most unique feature are the time windows of growing width. Data are pro-
cessed and recorded in batches of uniform size, but as time progresses, older records
are merged together. This leverages the fact that users often want to know about re-
cent history with fine granularity but are satisfied with a coarser granularity for longer
or older time periods. Because the window sizes grow exponentially, the number of
windows grows logarithmically, thus making the memory requirements tractable.
The candidate frequent patterns for each data batch are discovered using an FP-
Tree with error factor , like the one in LOSSY COUNTING [ 61 ]. These candidate
patterns and their counts are summarized in a table. f ( i , j ) means the table of frequent
patterns for the time interval [ t i : t j ]. As batches become older, they are merged with
adjacent batches into windows with sizes that are powers of 2. If the total number
of elapsed batches is 2 n for some integer n , then there will n
1 windows of size 1,
1, 2, 4, 8, etc. When the total number of batches is not such a perfect number, there
may be temporarily up to three windows covering just a single batch and up to two
windows for larger sizes. For example, if 16 batches have been processed, then the
windows would be sized as follows:
+
f (0, 7), f (8, 11), f (12, 13), f (14), f (15)
However, if only 15 batches have been processed, then these would be the sizes:
f (0, 3), f (4, 7), f (8, 9), f (10, 11), f (12), f (13), f (14)
Similar to how LOSSY COUNTING will delete an itemset if its support level fails
below , the tilted time window algorithm (FP-STREAMING also drops infrequent
Search WWH ::




Custom Search