A Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

in the context of algorithms interchangeably, both referring to the concurrent

execution of base classifiers through distribution of the training data to multiple

computer cluster nodes.

This paper provides a detailed and exhaustive description of Random Prism

and Parallel Random Prism approaches. Additionally, it also provides, for the

first time, a formal theoretical scalability analysis of Random Prism and Parallel

Random Prism, which examines the scalability to much larger computer clusters.

This contribution provides a theoretical underpinning that can be used for scal-

ability of the MapReduce framework. It also presents a thorough experimental

study of Parallel Random Prism's scalability. In particular we look into its scal-

ability with respect to the number of training examples and number of features.

It is worth noting that we use the terms 'feature' and 'attribute' interchangeably

in this paper.

This paper's structure is as follows: Sect. 2 presents the Random Prism

ensemble learner. The parallel version of Random Prism is outlined in Sect. 3 .

Section 4 provides a theoretical scalability analysis of a standalone Prism classi-

fier, the Random Prism ensemble learner and then the Parallel Random Prism

approach. This formal scalability analysis is then supported by an empirical eval-

uation in Sect. 5 . Finally, Sect. 6 closes the paper with some concluding remarks.

2 Random Prism

As aforementioned Random Prism is inspired by RDF and RF. Ho's RDF app-

roach induces multiple trees, each induced on a random subset of the feature space

[ 14 ]. This is done in order to make the individual trees generalise better on

the training data, which Ho evaluated empirically. RF similarly to RDF induces

the trees on feature subsets. However, differently from RDF, RF uses a new ran-

dom subset of the feature space for evaluating the possible splits of each node in

the decision tree [ 7 ]. In addition, RF also uses 'Bagging' [ 6 ] in order to further

increase the predictive accuracy of the ensemble classifier. This is according to [ 12 ]

because the composite classifier model reduces the variance of the individual clas-

sifiers. However, the authors of [ 11 ] suggest that bagging not necessarily always

reduces variance, but also equalises the influence of training examples and thus

stabilises the classifier. Bagging builds for each base classifier a bootstrap sample

D i of a training dataset

D

using sampling with replacement [ 6 ]. Most commonly

D i is of size

.

In this paper, we adopt PrismTCS which is a computationally ecient mem-

ber of the Prism family, and also maintains a similar predictive accuracy com-

pared with the original Prism classifier [ 5 ]. A good computational eciency is

needed as ensemble learners generally do not scale well to large datasets. Due to

bagging even modest sized training data present a considerable computational

challenge to ensemble learners. In addition, the implemented base classifier makes

use of J-pruning as it not only generalises the induced classifier further, but also

lowers its runtime [ 19 ]. This is because J-pruning will reduce the number of rule

terms induced and thus lower the number of iterations of the base classifier [ 19 ].

N

where

N

is the number of training instances in

D

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home