Database Reference
In-Depth Information
The reasons that lead to three different clusters of domains in convergence need to
be analysed. The analysis of the influence of the pruning in the error and the
bias/variance decomposition can be interesting in this study.
The CTC algorithm provides a way to deal with the need of resampling the training
set. Anyway, we are working in quantifying the influence that changes in the class
distribution can have in the CTC algorithm. It would also be interesting the
comparison of the results obtained with other techniques that use resampling in order
to improve the accuracy of the classifier, such as bagging, boosting, etc., although
they completely miss the explaining capacity.
Acknowledgments
The work described in this paper was partly done under the University of Basque
Country (UPV/EHU) project: 1/UPV 00139.226-T-15920/2004. It was also funded by
the DiputaciĆ³n Foral de Guipuzcoa and the European Union.
We would like to thank the company Fagor Electrodomesticos, S. COOP. for
permitting us the use of their data ( Faithful ) obtained through the project BETIKO.
The lymphograph y domain was obtained from the University Medical Centre,
Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic
for providing the data.
References
1. Bauer E., Kohavi R.: An Empirical Comparison of Voting Classification Algorithms:
Bagging, Boosting, and Variants, Machine Learning, Vol. 36, (1999) 105-139.
2. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, University of
California, Irvine, Dept. of Information and Computer Sciences. http://www.ics.uci.
edu/~mlearn/MLRepository.html (1998).
3. Breiman L.: Bagging Predictors. Machine Learning, Vol. 24, (1996) 123-140.
4. Chan P.K., Stolfo S.J.: Toward Scalable Learning with Non-uniform Class and Cost
Distributions: A Case Study in Credit Card Fraud Detection, Proceedings of the 4th
International Conference on Knowledge Discovery and Data Mining, (1998) 164-168.
5. Dietterich T.G.: Approximate Statistical Tests for Comparing Supervised Classification
Learning Algorithms, Neural Computation, Vol. 10, No. 7, (1998) 1895-1924.
6. Dietterich T.G.: An Experimental Comparison of Three Methods for Constructing
Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning,
Vol. 40, (2000) 139-157.
7. Domingos P.: Knowledge acquisition from examples via multiple models. Proc. 14th
International Conference on Machine Learning Nashville, TN (1997) 98-106.
8. Drummond C., Holte R.C.: Exploiting the Cost (In)sensitivity of Decision Tree Splitting
Criteria, Proceedings of the 17th International Conference on Machine Learning, (2000)
239-246.
9. Elkan C.: The Foundations of Cost-Sensitive Learning, Proceedings of the 17th
International Joint Conference on Artificial Intelligence, (2001) 973-978.
10. Freund, Y., Schapire, R. E.: Experiments with a New Boosting Algorithm, Proceedings of
the 13th International Conference on Machine Learning, (1996) 148-156.
Search WWH ::




Custom Search