A Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

Fig. 3. The Parallel Random Prism Architecture on a Hadoop cluster with two com-

putational nodes, four Mappers and one Reducer.

unlabelled data instances. The basic steps of Parallel Random Prism are outlined

in Algorithm 2 .

Algorithm 2 : Parallel Random Prism Algorithm

Step 1: Distribute a copy of the training data to each node in the cluster using

the Hadoop Distributed File System;

Step 2: Start k Mappers, where k is the number of R-PrismTCS classifiers

desired. Each Mapper comprises, in the following order;

- Build a training and validation set using Bagging;

- Induce a ruleset by applying R-PrismTCS on the training data;

- Calculate the ruleset's weight using the validation data;

- Send the ruleset and its weight to the Reducer;

Step 3: Optionally the Reducer applies a filter to eliminate the worse and

retrain the strongest rulesets according to their weights;

Step 4: The Reducer returns the final classifier, which is a set of R-PrismTCS

rulesets, which perform weighted majority voting for each new unlabelled data

instance;

4 Theoretical Analysis of Parallel Random Prism

The complexity of PrismTCS is based on the number of probability calculations

for possible split values. In this paper this is denoted as the number of cutpoints.

In the ideal case, there would be one feature that perfectly separates all the

classes, or simply all data instances would belong to the same class. An average

Search WWH ::

Custom Search

Home