A Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

network is used with a switch in the centre node, which is the actual setup

we used for our empirical evaluation in Sect. 5 . In this case a multicast can be

used which transmits the training data from the original node only once to the

switch, which then multiplies the data and distributes them to each computing

node on a separate wire. Hence, in this case, we can ignore the multiplication of

T comdat,i with

T asm,i remains a computational bottleneck,

which increases with the number of base classifiers. However, its computational

requirements are relatively low even for large numbers of base classifiers and

is not expected to have a large impact on

as in this case

=1.

T total (

). Nevertheless, it would be

possible to parallelise

T asm,i , at least to a certain extent, by using multiple

Reducers executed on different cluster nodes. One Reducer per two Mappers

could combine the rule sets of these two mappers. The Reducers' outputs (again

rules sets) could then be combined similarly using further Reducers executed on

different cluster nodes. This may be beneficial for very large numbers of base

classifiers. The speed-up factor is a standard metric to evaluate the scalability of

parallel algorithms with respect to the number of computing nodes or processors

being used [ 13 , 15 ]. It shows how much a parallel version of an algorithm is

faster compared with its single processor version. The generic formula for the

speed-up

(

)is:

)= runtime T on

processor

runtime T on p processors

(

For Parallel Random Prism the numerator of

(

) can be substituted by

T total (1)

and the denominator of

(

) can be substituted by

T total (

). Thus the speed-up

for Parallel Random Prism can be described by:

T comdat +

T sam,i +

T cla,i +

T comres,i +

T asm,i

S ( p )= T total (1)

T total ( p )

i =1

T sam,i

T cla,i

T comres,i

T asm,i

i =1

T comdat · p +

Again, what can be seen is that the only limiting factors are

T comdat · p

and

i =1 T asm,i

in the denominator of

(

) as they are not parallelised. Yet, the

time needed to execute

is neglectably small compared

with the parallelised portions of Parallel Random Prism. Thus we can assume

that the

T asm,i

and

T comdat · p

. For example, if

running Parallel Random Prism consumes 10000 ms on one node, then for using

4 nodes we would expect the runtime to be 2500 ms (4 times faster assuming the

ideal case), hence

(

) will be close to the ideal case, which is

(

(4) = 10000ms

2500ms

=4.

) above could also be used to determine the theoretical

maximum speed-up, through building the derivative

The formula for

(

S (

), and calculating its

x-axis intercepts and then determining subsequently its global maxima. However,

we refrain from this step.

Next Sect. 5 will provide an empirical analysis of Parallel Random Prism

supporting the theoretical analysis presented in this section.

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home