Information Technology Reference
In-Depth Information
Furthermore we covered the fundamentals of the process of knowledge discovery
in databases.
From both we learned that with regard to human involvement and interac-
tivity the current situation is far from being satisfying. We worked out the basic
problem and than tackled it on three sides:
First of all there is the algorithmic complexity. We demonstrated that to-
day's state of the art algorithms offer impressive performance with regard to the
immense search space they need to deal with. Anyway we came to the conclusion
that this is still not enough to allowtrue interactivity in a human centered KDD
process. Nevertheless we present a rule caching schema that significantly reduces
the number of mining runs. This schema helps to gain interactivity even in the
presence of extreme run times of the mining algorithms. Accessing a properly
implemented cache only takes seconds.
Second, we pointed out that the integration of the mining algorithm with
the other KDD phases is also a crucial aspect. Interactivity tremendously suf-
fers when proceeding from one KDD phase to next is not smooth but implies
annoying user interference. For that purpose we present an e - cient integration
of association rule mining algorithms with modern database systems.
Third, interesting rules must be picked by the data mining analyst from
the set of generated rules. This might be quite costly because the generated rule
sets normally are quite large - e.g. more than 100 , 000 rules are not uncommon -
whereas the percentage of useful rules is typically only a very small fraction. We
enhanced the traditional association rule mining framework by giving structure
to the items. Adding attributes to the items as proposed does not affect the
mining procedure but introduces a newmeans to formulate practically important
mining queries.
References
1. P. Adriaans and D. Zantinge. Data Mining . Addison-Wesley, Harlow, England,
1996.
2. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of
items in large databases. In Proceedings of the ACM SIGMOD International Con-
ference on Management of Data (ACM SIGMOD '93) , pages 207-216, Washington,
USA, May 1993.
3. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Pro-
ceedings of the 20th International Conference on Very Large Databases (VLDB
'94) , Santiago, Chile, June 1994.
4. T. Barth. Guidelines for the data mining process. Technical report, University of
Stuttgart, Stuttgart, Germany, 1998. ESPRIT Project Number 22700.
5. R. J. Brachman and T. Anand. The process of knowledge discovery in databases:
A human centered approach. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth,
and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining ,
chapter 2, pages 37-57. AAAI/MIT Press, 1996.
6. S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing
association rules to correlations. In Proceedings of the ACM SIGMOD International
Conference on Management of Data (ACM SIGMOD '97) , pages 265-276, 1997.
Search WWH ::




Custom Search