Consolidated Trees: An Analysis of Structural Convergence - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

The reasons that lead to three different clusters of domains in convergence need to

be analysed. The analysis of the influence of the pruning in the error and the

bias/variance decomposition can be interesting in this study.

The CTC algorithm provides a way to deal with the need of resampling the training

set. Anyway, we are working in quantifying the influence that changes in the class

distribution can have in the CTC algorithm. It would also be interesting the

comparison of the results obtained with other techniques that use resampling in order

to improve the accuracy of the classifier, such as bagging, boosting, etc., although

they completely miss the explaining capacity.

Acknowledgments

The work described in this paper was partly done under the University of Basque

Country (UPV/EHU) project: 1/UPV 00139.226-T-15920/2004. It was also funded by

the Diputación Foral de Guipuzcoa and the European Union.

We would like to thank the company Fagor Electrodomesticos, S. COOP. for

permitting us the use of their data ( Faithful ) obtained through the project BETIKO.

The lymphograph y domain was obtained from the University Medical Centre,

Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic

for providing the data.

References

1. Bauer E., Kohavi R.: An Empirical Comparison of Voting Classification Algorithms:

Bagging, Boosting, and Variants, Machine Learning, Vol. 36, (1999) 105-139.

2. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, University of

California, Irvine, Dept. of Information and Computer Sciences. http://www.ics.uci.

edu/~mlearn/MLRepository.html (1998).

3. Breiman L.: Bagging Predictors. Machine Learning, Vol. 24, (1996) 123-140.

4. Chan P.K., Stolfo S.J.: Toward Scalable Learning with Non-uniform Class and Cost

Distributions: A Case Study in Credit Card Fraud Detection, Proceedings of the 4th

International Conference on Knowledge Discovery and Data Mining, (1998) 164-168.

5. Dietterich T.G.: Approximate Statistical Tests for Comparing Supervised Classification

Learning Algorithms, Neural Computation, Vol. 10, No. 7, (1998) 1895-1924.

6. Dietterich T.G.: An Experimental Comparison of Three Methods for Constructing

Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning,

Vol. 40, (2000) 139-157.

7. Domingos P.: Knowledge acquisition from examples via multiple models. Proc. 14th

International Conference on Machine Learning Nashville, TN (1997) 98-106.

8. Drummond C., Holte R.C.: Exploiting the Cost (In)sensitivity of Decision Tree Splitting

Criteria, Proceedings of the 17th International Conference on Machine Learning, (2000)

239-246.

9. Elkan C.: The Foundations of Cost-Sensitive Learning, Proceedings of the 17th

International Joint Conference on Artificial Intelligence, (2001) 973-978.

10. Freund, Y., Schapire, R. E.: Experiments with a New Boosting Algorithm, Proceedings of

the 13th International Conference on Machine Learning, (1996) 148-156.

Search WWH ::

Custom Search

Home