Database Reference
In-Depth Information
in the context of algorithms interchangeably, both referring to the concurrent
execution of base classifiers through distribution of the training data to multiple
computer cluster nodes.
This paper provides a detailed and exhaustive description of Random Prism
and Parallel Random Prism approaches. Additionally, it also provides, for the
first time, a formal theoretical scalability analysis of Random Prism and Parallel
Random Prism, which examines the scalability to much larger computer clusters.
This contribution provides a theoretical underpinning that can be used for scal-
ability of the MapReduce framework. It also presents a thorough experimental
study of Parallel Random Prism's scalability. In particular we look into its scal-
ability with respect to the number of training examples and number of features.
It is worth noting that we use the terms 'feature' and 'attribute' interchangeably
in this paper.
This paper's structure is as follows: Sect. 2 presents the Random Prism
ensemble learner. The parallel version of Random Prism is outlined in Sect. 3 .
Section 4 provides a theoretical scalability analysis of a standalone Prism classi-
fier, the Random Prism ensemble learner and then the Parallel Random Prism
approach. This formal scalability analysis is then supported by an empirical eval-
uation in Sect. 5 . Finally, Sect. 6 closes the paper with some concluding remarks.
2 Random Prism
As aforementioned Random Prism is inspired by RDF and RF. Ho's RDF app-
roach induces multiple trees, each induced on a random subset of the feature space
[ 14 ]. This is done in order to make the individual trees generalise better on
the training data, which Ho evaluated empirically. RF similarly to RDF induces
the trees on feature subsets. However, differently from RDF, RF uses a new ran-
dom subset of the feature space for evaluating the possible splits of each node in
the decision tree [ 7 ]. In addition, RF also uses 'Bagging' [ 6 ] in order to further
increase the predictive accuracy of the ensemble classifier. This is according to [ 12 ]
because the composite classifier model reduces the variance of the individual clas-
sifiers. However, the authors of [ 11 ] suggest that bagging not necessarily always
reduces variance, but also equalises the influence of training examples and thus
stabilises the classifier. Bagging builds for each base classifier a bootstrap sample
D i of a training dataset
D
using sampling with replacement [ 6 ]. Most commonly
D i is of size
.
In this paper, we adopt PrismTCS which is a computationally ecient mem-
ber of the Prism family, and also maintains a similar predictive accuracy com-
pared with the original Prism classifier [ 5 ]. A good computational eciency is
needed as ensemble learners generally do not scale well to large datasets. Due to
bagging even modest sized training data present a considerable computational
challenge to ensemble learners. In addition, the implemented base classifier makes
use of J-pruning as it not only generalises the induced classifier further, but also
lowers its runtime [ 19 ]. This is because J-pruning will reduce the number of rule
terms induced and thus lower the number of iterations of the base classifier [ 19 ].
N
where
N
is the number of training instances in
D
Search WWH ::




Custom Search