Frequent Pattern Mining in Data Streams - Frequent Pattern Mining

Database Reference

In-Depth Information

frequent pattern mining. There are numerous variations of this mining task, including

different data windowing approaches and different guarantees of the counting accu-

racy. Though many improvements in efficiency have been made over the years, the

basic principle of LOSSY COUNTING remains one of the core approaches. To reduce

the memory requirements, some algorithms seek only the maximal frequent itemsets

or closed frequent itemsets. Because all or more frequent itemsets can be recovered

from these, they act as a form of data compression. In these cases, it may be pos-

sible to discover an exact result set. Recently, several algorithms have considered

the problem of uncertain data. In addition to frequent itemset mining, a few works

have studied frequent subsequences and subgraphs. In the future, we can expect that

all the forms of frequent pattern mining for fixed datasets will be investigated for

streaming data as well.

One technique that has not yet been much studied is mining from a sampling of

the data stream. This may be an interesting area for future work. Compared with

existing sampling techniques [ 12 , 18 , 73 ] on disk-resident datasets for frequent

itemset mining, sampling data streams brings some new issues. For example, the

underlying distribution of the data stream can change from time to time. Therefore,

sampling needs to adapt to the data stream. However, it is hard to accurately detect

drift if we do not mine the set of frequent itemsets directly. In addition, the space

requirement of the sample set can be an issue as well. As pointed by Manku and

Motwani [ 60 ], methods similar to concise sampling [ 29 ] might be helpful to reduce

the space and achieve better mining results.

References

1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of Int. conf.

Very Large DataBases (VLDB'94) , pages 487-499, Santiago, Chile, September 1994.

2. Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in

large databases. In Proceedings of the 20th International Conference on Very Large Data Bases ,

VLDB '94, pages 487-499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers

Inc.

3. Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In Data Engineering,

1995. Proceedings of the Eleventh International Conference on , pages 3-14. IEEE, 1995.

4. R. Agrawal, H. Mannila, R. Srikant, H. Toivonent, and A. Inkeri Verkamo. Fast discovery of

association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data

Mining , pages 307-328. AAAI Press, Menlo Park, CA, 1996.

5. Charu C. Aggarwal, Yao Li, Philip S. Yu, and Ruoming Jin. On dense pattern mining in graph

streams. Proc. VLDB Endow. , 3(1-2):975-984, September 2010.

6. Tatsuya Asai, Hiroki Arimura, Kenji Abe, Shinji Kawasoe, and Setsuo Arikawa. Online

algorithms for mining semi-structured data stream. In Data Mining, 2002. ICDM 2003.

Proceedings. 2002 IEEE International Conference on , pages 27-34. IEEE, 2002.

7. Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroki Arimura, and Setsuo Arikawa. Efficient algo-

rithms for finding frequent substructures from semi-structured data streams. In New Frontiers

in Artificial Intelligence , pages 29-45. Springer, 2007.

8. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream

Systems. In Proceedings of the 2002 ACM Symposium on Principles of Database Systems

(PODS 2002) (Invited Paper) . ACM Press, June 2002.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home