A Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

Fig. 2. A typical setup of a Hadoop computing cluster. A physical node in the computer

cluster can execute more than one Mapper and Reducer.

3.2 Parallel Random Prism Classifier

Several aspects of the Random Prism algorithm have to be considered for the

parallelisation through data parallelism with the MapReduce paradigm. These

are the bagging procedure, the induction of R-PrismTCS classifiers and the

combination of the individual classifiers into a composite classifier.

Induction of R-PrismTCS Classifiers. As mentioned in Sect. 3.1 ,Ran-

dom Prism can be broken down into multiple R-PrismTCS classifiers induced

on bagged samples of the training data. These R-PrismTCS classifiers can be

induced independently. The only operation that requires the input of all clas-

sifiers is the aggregation of their individual sets of classification rules and their

weights. Hence, the induction of a R-PrismTCS classifier is implemented directly

in a Mapper. Multiple instances of this Mapper can be executed concurrently

in a Hadoop cluster. If there are more instances of Mappers than computing

nodes, then several Mappers queue to be executed on a node. Thus we keep the

computational nodes utilised through pipelining. However, the execution of p

Mappers at the same time is still concurrent, where p is the number of available

computing nodes in the cluster. Once the last mappers are executed on the clus-

ter there may be a small synchronisation overhead as some mappers may finish

earlier than others, thus leaving some of the computational nodes idle, but only

in the very last stage of the algorithm's execution.

Search WWH ::

Custom Search

Home