Database Reference
In-Depth Information
was examined in terms of its runtime with respect to the number of computing
nodes used. In general a close to linear scalability was expected, as the main part
of the workload, the base classifier induction was parallelised. However, the data
communication to the cluster nodes at the beginning and the combining proce-
dures were not parallelised, hence an upper limit of beneficial computing nodes
was expected. Section 5 further supported the theoretical analysis with empirical
results. In these results Parallel Random Prism's linear scalability with respect
to the number of training instances and features was confirmed. These results
also showed that Parallel Random Prism exhibits an almost ideal speed-up for
up to 10 cluster nodes with a slightly increasing deterioration the more clus-
ter nodes are utilised. The results suggested that there is an upper limit (due
to the non-parallel parts of Parallel Random Prism). However, the results also
suggested that the cluster is far away from its maximum number of beneficial
cluster nodes.
References
1. Hadoop (2014). http://hadoop.apache.org/
2. Bacardit, J., Krasnogor, N.: The infobiotics PSP benchmarks repository. Technical
report (2008)
3. Bache, K., Lichman, M.: UCI machine learning repository (2013)
4. Bramer, M.A.: Automatic induction of classification rules from examples using
N-Prism. In: Bramer, M., Macintosh, A., Coenen, F. (eds.) Research and Develop-
ment in Intelligent Systems XVI, pp. 99-121. Springer-Verlag, London (2000)
5. Bramer, M.A.: An information-theoretic approach to the pre-pruning of classifica-
tion rules. In: Musen, M.A., Neumann, B., Studer, R. (eds.) Intelligent Information
Processing. IFIP, vol. 93, pp. 201-212. Springer, Boston (2002)
6. Breiman, L.: Bagging predictors. Mach. Learn. 24 (2), 123-140 (1996)
7. Breiman, L.: Random forests. Mach. Learn. 45 (1), 5-32 (2001)
8. Cendrowska, J.: PRISM: an algorithm for inducing modular rules. Int. J. Man
Mach. Stud. 27 (4), 349-370 (1987)
9. Chan, P., Stolfo, S.J.: Meta-Learning for multi strategy and parallel learning. In:
Proceedings of Second International Workshop on Multistrategy Learning, pp. 150-
165 (1993)
10. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters.
Commun. ACM 51 , 107-113 (2008)
11. Grandvalet, Y.: Bagging equalizes influence. Mach. Learn. 55 (3), 251-270 (2004)
12. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. The Morgan
Kaufmann Series in Data Management Systems. Elsevier, Amsterdam (2011)
13. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative App-
roach, 3rd edn. Morgan Kaufmann, San Mateo (2003)
14. Ho, T.K.: Random decision forests. In: International Conference on Document
Analysis and Recognition, vol. 1, p. 278 (1995)
15. Hwang, K., Briggs, F.A.: Computer Architecture and Parallel Processing.
McGraw-Hill Book Co., New York (1987). International edition
16. Liu, T., Rosenberg, C., Rowley, H.A.: Clustering billions of images with large
scale nearest neighbor search. In: Proceedings of the Eighth IEEE Workshop on
Applications of Computer Vision, WACV 2007, Washington, DC, USA, p. 28. IEEE
Computer Society (2007)
Search WWH ::




Custom Search