Frequent Pattern Mining in Data Streams - Frequent Pattern Mining - page 207

Database Reference

In-Depth Information

Suppose item e is frequent, meaning that it occurs at least Nθ times in a dataset.

The KPS method randomly removes

unique items from the dataset. This

eliminates at most one instance of e . We can remove at most

1 /θ

θ

such sets, with the

final set possibly being smaller than

. It follows that the final set must contain

e . However, there is no way to detect which members of the result set are frequent

and which are not.

Jin and Agrawal enhance KPS to use a single pass and to implement an accuracy

bound analogous to Manku and Motwani's work [ 61 ]. Besides reporting all items

or itemsets that occur with frequency more than Nθ , they report only the items

or itemsets which appear with frequency at least Nθ (1

1 /θ

1.

Algorithm 7 (BOUNDEDCOUNT outlines their method. In the first stage, we invoke

the algorithm from Karp et al., with frequency level θ .

−

), where 0 <

≤

is the set of potentially

frequent items. We maintain a count for each item in the set

P

. This set is initially

empty. As the algorithm processes a new item from a sequence, we check if it is in

the set

P

. If yes, its count is incremented, otherwise, it is inserted with a count of

1. When the size of the set

P

P

becomes larger than

1 /θ

, we decrement the count

of each item in

, and eliminate any item whose count has now become 0. This

processing is equivalent to the removals we described earlier. We also record the

number of elimination rounds, c , that occur. Clearly, c

P

≤

Nθ . In the second step,

we remove all items whose reported frequency is less than Nθ

−

≥

−

c

Nθ (1

).

Next Page

Frequent Pattern Mining

Search WWH ::

Custom Search

Home