Database Reference
In-Depth Information
Fig. 2. A typical setup of a Hadoop computing cluster. A physical node in the computer
cluster can execute more than one Mapper and Reducer.
3.2 Parallel Random Prism Classifier
Several aspects of the Random Prism algorithm have to be considered for the
parallelisation through data parallelism with the MapReduce paradigm. These
are the bagging procedure, the induction of R-PrismTCS classifiers and the
combination of the individual classifiers into a composite classifier.
Induction of R-PrismTCS Classifiers. As mentioned in Sect. 3.1 ,Ran-
dom Prism can be broken down into multiple R-PrismTCS classifiers induced
on bagged samples of the training data. These R-PrismTCS classifiers can be
induced independently. The only operation that requires the input of all clas-
sifiers is the aggregation of their individual sets of classification rules and their
weights. Hence, the induction of a R-PrismTCS classifier is implemented directly
in a Mapper. Multiple instances of this Mapper can be executed concurrently
in a Hadoop cluster. If there are more instances of Mappers than computing
nodes, then several Mappers queue to be executed on a node. Thus we keep the
computational nodes utilised through pipelining. However, the execution of p
Mappers at the same time is still concurrent, where p is the number of available
computing nodes in the cluster. Once the last mappers are executed on the clus-
ter there may be a small synchronisation overhead as some mappers may finish
earlier than others, thus leaving some of the computational nodes idle, but only
in the very last stage of the algorithm's execution.
Search WWH ::




Custom Search