Database Reference
In-Depth Information
Fig. 3. The Parallel Random Prism Architecture on a Hadoop cluster with two com-
putational nodes, four Mappers and one Reducer.
unlabelled data instances. The basic steps of Parallel Random Prism are outlined
in Algorithm 2 .
Algorithm 2 : Parallel Random Prism Algorithm
Step 1: Distribute a copy of the training data to each node in the cluster using
the Hadoop Distributed File System;
Step 2: Start k Mappers, where k is the number of R-PrismTCS classifiers
desired. Each Mapper comprises, in the following order;
- Build a training and validation set using Bagging;
- Induce a ruleset by applying R-PrismTCS on the training data;
- Calculate the ruleset's weight using the validation data;
- Send the ruleset and its weight to the Reducer;
Step 3: Optionally the Reducer applies a filter to eliminate the worse and
retrain the strongest rulesets according to their weights;
Step 4: The Reducer returns the final classifier, which is a set of R-PrismTCS
rulesets, which perform weighted majority voting for each new unlabelled data
instance;
4 Theoretical Analysis of Parallel Random Prism
The complexity of PrismTCS is based on the number of probability calculations
for possible split values. In this paper this is denoted as the number of cutpoints.
In the ideal case, there would be one feature that perfectly separates all the
classes, or simply all data instances would belong to the same class. An average
 
Search WWH ::




Custom Search