Database Reference
In-Depth Information
Suppose item e is frequent, meaning that it occurs at least times in a dataset.
The KPS method randomly removes
unique items from the dataset. This
eliminates at most one instance of e . We can remove at most
1
θ
such sets, with the
final set possibly being smaller than
. It follows that the final set must contain
e . However, there is no way to detect which members of the result set are frequent
and which are not.
Jin and Agrawal enhance KPS to use a single pass and to implement an accuracy
bound analogous to Manku and Motwani's work [ 61 ]. Besides reporting all items
or itemsets that occur with frequency more than , they report only the items
or itemsets which appear with frequency at least (1
1
1.
Algorithm 7 (BOUNDEDCOUNT outlines their method. In the first stage, we invoke
the algorithm from Karp et al., with frequency level θ .
), where 0 <
is the set of potentially
frequent items. We maintain a count for each item in the set
P
. This set is initially
empty. As the algorithm processes a new item from a sequence, we check if it is in
the set
P
. If yes, its count is incremented, otherwise, it is inserted with a count of
1. When the size of the set
P
P
becomes larger than
1
, we decrement the count
of each item in
, and eliminate any item whose count has now become 0. This
processing is equivalent to the removals we described earlier. We also record the
number of elimination rounds, c , that occur. Clearly, c
P
. In the second step,
we remove all items whose reported frequency is less than
c
(1
).
Search WWH ::




Custom Search