Database Reference
In-Depth Information
frequent pattern mining. There are numerous variations of this mining task, including
different data windowing approaches and different guarantees of the counting accu-
racy. Though many improvements in efficiency have been made over the years, the
basic principle of LOSSY COUNTING remains one of the core approaches. To reduce
the memory requirements, some algorithms seek only the maximal frequent itemsets
or closed frequent itemsets. Because all or more frequent itemsets can be recovered
from these, they act as a form of data compression. In these cases, it may be pos-
sible to discover an exact result set. Recently, several algorithms have considered
the problem of uncertain data. In addition to frequent itemset mining, a few works
have studied frequent subsequences and subgraphs. In the future, we can expect that
all the forms of frequent pattern mining for fixed datasets will be investigated for
streaming data as well.
One technique that has not yet been much studied is mining from a sampling of
the data stream. This may be an interesting area for future work. Compared with
existing sampling techniques [ 12 , 18 , 73 ] on disk-resident datasets for frequent
itemset mining, sampling data streams brings some new issues. For example, the
underlying distribution of the data stream can change from time to time. Therefore,
sampling needs to adapt to the data stream. However, it is hard to accurately detect
drift if we do not mine the set of frequent itemsets directly. In addition, the space
requirement of the sample set can be an issue as well. As pointed by Manku and
Motwani [ 60 ], methods similar to concise sampling [ 29 ] might be helpful to reduce
the space and achieve better mining results.
References
1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of Int. conf.
Very Large DataBases (VLDB'94) , pages 487-499, Santiago, Chile, September 1994.
2. Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in
large databases. In Proceedings of the 20th International Conference on Very Large Data Bases ,
VLDB '94, pages 487-499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers
Inc.
3. Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In Data Engineering,
1995. Proceedings of the Eleventh International Conference on , pages 3-14. IEEE, 1995.
4. R. Agrawal, H. Mannila, R. Srikant, H. Toivonent, and A. Inkeri Verkamo. Fast discovery of
association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data
Mining , pages 307-328. AAAI Press, Menlo Park, CA, 1996.
5. Charu C. Aggarwal, Yao Li, Philip S. Yu, and Ruoming Jin. On dense pattern mining in graph
streams. Proc. VLDB Endow. , 3(1-2):975-984, September 2010.
6. Tatsuya Asai, Hiroki Arimura, Kenji Abe, Shinji Kawasoe, and Setsuo Arikawa. Online
algorithms for mining semi-structured data stream. In Data Mining, 2002. ICDM 2003.
Proceedings. 2002 IEEE International Conference on , pages 27-34. IEEE, 2002.
7. Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroki Arimura, and Setsuo Arikawa. Efficient algo-
rithms for finding frequent substructures from semi-structured data streams. In New Frontiers
in Artificial Intelligence , pages 29-45. Springer, 2007.
8. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream
Systems. In Proceedings of the 2002 ACM Symposium on Principles of Database Systems
(PODS 2002) (Invited Paper) . ACM Press, June 2002.
Search WWH ::




Custom Search