Database Reference
In-Depth Information
(a) Use the file that was collected while the first iteration of the algorithm was run-
ning. At the same time, collect yet another file to be used at another iteration of
the algorithm, when this current iteration finishes.
(b) Start collecting another file of baskets now, and run the algorithm when an ad-
equate number of baskets has been collected.
(2) We can continue to count the numbers of occurrences of each of these frequent item-
sets, along with the total number of baskets seen in the stream, since the counting star-
ted. If any itemset is discovered to occur in a fraction of the baskets that is signific-
antly below the threshold fraction s , then this set can be dropped from the collection of
frequent itemsets. When computing the fraction, it is important to include the occur-
rences from the original file of baskets, from which the frequent itemsets were derived.
If not, we run the risk that we shall encounter a short period in which a truly frequent
itemset does not appear sufficiently frequently and throw it out. We should also allow
some way for new frequent itemsets to be added to the current collection. Possibilities
include:
(a) Periodically gather a new segment of the baskets in the stream and use it as the
data file for another iteration of the chosen frequent-itemsets algorithm. The new
collection of frequent items is formed from the result of this iteration and the fre-
quent itemsets from the previous collection that have survived the possibility of
having been deleted for becoming infrequent.
(b) Add some random itemsets to the current collection, and count their fraction of
occurrences for a while, until one has a good idea of whether or not they are cur-
rently frequent. Rather than choosing new itemsets completely at random, one
might focus on sets with items that appear in many itemsets already known to be
frequent. For example, a good choice is to pick new itemsets from the negative
border ( Section 6.4.5 ) of the current set of frequent itemsets.
6.5.2
Frequent Itemsets in Decaying Windows
Recall from Section 4.7 that a decaying window on a stream is formed by picking a small
constant c and giving the i th element prior to the most recent element the weight (1− c ) i , or
approximately e ci . Section 4.7.3 actually presented a method for computing the frequent
items, provided the support threshold is defined in a somewhat different way. That is, we
considered, for each item, a stream that had 1 if the item appeared at a certain stream ele-
ment and 0 if not. We defined the “score” for that item to be the sum of the weights where
its stream element was 1. We were constrained to record all items whose score was at least
1/2. We can not use a score threshold above 1, because we do not initiate a count for an
Search WWH ::




Custom Search