Big Data Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

their work with idling ones or not. Another open problem is that of mining sub-

patterns in a large object, where sub-patterns can span multiple process' data. Current

methods for sequence motif mining and frequent subgraph mining in a large graph

either rely on maximum pattern length constraints that allow each process to store

overlapping data partition boundaries or transfer data partitions amongst all processes

during each iteration of the algorithm. Neither solution scales when presented with

Big Data, calling for efficient methods to solve this problem exactly.

References

1. Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in

large databases. In International Conference on Very Large Data Bases , VLDB '94, pages

487-499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.

2. Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In International Con-

ference on Data Engineering , ICDE '95, pages 3-14, Washington, DC, USA, 1995. IEEE

Computer Society.

3. Rakesh Agrawal and John C. Shafer. Parallel mining of association rules. IEEE Transactions

on Knowledge and Data Engineering , 8(6):962-969, 1996.

4. Ramesh C. Agarwal, Charu C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for

generation of frequent item sets. Journal of Parallel and Distributed Computing , 61(3):350-

371, March 2001.

5. Big data meets big data analytics. http://www.sas.com/resources/whitepaper/wp_46345.pdf.

Accessed: 2014-03-06.

6. Christian Borgelt and Michael R. Berthold. Mining molecular fragments: Finding relevant

substructures of molecules. In IEEE International Conference on Data Mining , ICDM 2002,

pages 51-58. IEEE, 2002.

7. Dhruba Borthakur. The hadoop distributed file system: Architecture and design. Hadoop

Project Website , 11:21, 2007.

8. Gregory Buehrer, Srinivasan Parthasarathy, Anthony Nguyen, Daehyun Kim, Yen-Kuang

Chen, and Pradeep Dubey. Parallel graph mining on shared memory architectures. Technical

report, The Ohio State University, Columbus, OH, USA, 2005.

9. Shengnan Cong, Jiawei Han, Jay Hoeflinger, and David Padua. A sampling-based framework

for parallel data mining. In ACM SIGPLAN Symposium on Principles and Practice of Parallel

Programming , PPoPP '05, pages 255-265, New York, NY, USA, 2005. ACM.

10. Shengnan Cong, Jiawei Han, and David Padua. Parallel mining of closed sequential patterns. In

Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery

in Data Mining , KDD '05, pages 562-567, New York, NY, USA, 2005. ACM.

11. Diane J Cook, Lawrence B Holder, Gehad Galal, and Ron Maglothin. Approaches to parallel

graph-based knowledge discovery. Journal of Parallel and Distributed Computing , 61(3):427-

446, 2001.

12. Brian A. Davey and Hilary A. Priestley. Introduction to lattices and order .

Cambridge

University Press, Cambridge, 1990.

13. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters.

Communications of the ACM , 51(1):107-113, January 2008.

14. Giuseppe Di Fatta and Michael R. Berthold. Dynamic load balancing for the distributed mining

of molecular structures. IEEE Transactions on Parallel and Distributed Systems , 17(8):773-

785, 2006.

15. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In ACM

SIGOPS Operating Systems Review , volume 37, pages 29-43. ACM, 2003.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home