A Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

was examined in terms of its runtime with respect to the number of computing

nodes used. In general a close to linear scalability was expected, as the main part

of the workload, the base classifier induction was parallelised. However, the data

communication to the cluster nodes at the beginning and the combining proce-

dures were not parallelised, hence an upper limit of beneficial computing nodes

was expected. Section 5 further supported the theoretical analysis with empirical

results. In these results Parallel Random Prism's linear scalability with respect

to the number of training instances and features was confirmed. These results

also showed that Parallel Random Prism exhibits an almost ideal speed-up for

up to 10 cluster nodes with a slightly increasing deterioration the more clus-

ter nodes are utilised. The results suggested that there is an upper limit (due

to the non-parallel parts of Parallel Random Prism). However, the results also

suggested that the cluster is far away from its maximum number of beneficial

cluster nodes.

References

1. Hadoop (2014). http://hadoop.apache.org/

2. Bacardit, J., Krasnogor, N.: The infobiotics PSP benchmarks repository. Technical

report (2008)

3. Bache, K., Lichman, M.: UCI machine learning repository (2013)

4. Bramer, M.A.: Automatic induction of classification rules from examples using

N-Prism. In: Bramer, M., Macintosh, A., Coenen, F. (eds.) Research and Develop-

ment in Intelligent Systems XVI, pp. 99-121. Springer-Verlag, London (2000)

5. Bramer, M.A.: An information-theoretic approach to the pre-pruning of classifica-

tion rules. In: Musen, M.A., Neumann, B., Studer, R. (eds.) Intelligent Information

Processing. IFIP, vol. 93, pp. 201-212. Springer, Boston (2002)

6. Breiman, L.: Bagging predictors. Mach. Learn. 24 (2), 123-140 (1996)

7. Breiman, L.: Random forests. Mach. Learn. 45 (1), 5-32 (2001)

8. Cendrowska, J.: PRISM: an algorithm for inducing modular rules. Int. J. Man

Mach. Stud. 27 (4), 349-370 (1987)

9. Chan, P., Stolfo, S.J.: Meta-Learning for multi strategy and parallel learning. In:

Proceedings of Second International Workshop on Multistrategy Learning, pp. 150-

165 (1993)

10. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters.

Commun. ACM 51 , 107-113 (2008)

11. Grandvalet, Y.: Bagging equalizes influence. Mach. Learn. 55 (3), 251-270 (2004)

12. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. The Morgan

Kaufmann Series in Data Management Systems. Elsevier, Amsterdam (2011)

13. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative App-

roach, 3rd edn. Morgan Kaufmann, San Mateo (2003)

14. Ho, T.K.: Random decision forests. In: International Conference on Document

Analysis and Recognition, vol. 1, p. 278 (1995)

15. Hwang, K., Briggs, F.A.: Computer Architecture and Parallel Processing.

McGraw-Hill Book Co., New York (1987). International edition

16. Liu, T., Rosenberg, C., Rowley, H.A.: Clustering billions of images with large

scale nearest neighbor search. In: Proceedings of the Eighth IEEE Workshop on

Applications of Computer Vision, WACV 2007, Washington, DC, USA, p. 28. IEEE

Computer Society (2007)

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home