A Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

case is dicult to estimate, as the number of iterations of PrismTCS is dependent

on the number of rules and rule terms induced, which in turn are dependent on

the concept encoded in the training data. However, it is possible to estimate

the worst case, assuming that

the number

of features in the training data. Furthermore a categorical feature will occur at

most in only one term per rule, whereas a continuous feature may occur in two

terms per rule, as two rule terms can describe any value interval in a continuous

feature. Thus in the worst case all features are continuous and all rules have

2 ˙

is the number of instances and

terms. Also in the worst case each (with the exception of 1) instance is

encoded in a separate rule which will lead to

1is

because if there is only one instance left in step 6 of the PrismTCS pseudocode,

then there is no need to generate a further rule for it. The complexity (number

of cutpoint calculations) of inducing the

N −

1 rules in total. The

−

r th rule is 2

(

N − r

). The factor

(

) is the number of training instances not covered by the rules induced

so far, as mentioned above, in the worst case each rule covers only one training

instance. These uncovered instances are used for the induction of the next rule.

For example, the number of cutpoint calculations for a term of the first rule

(

N − r

= 1), where the training data is still of size

, would be 2

(

N −

1). The

total number of cutpoint calculations for the whole rule in this case (

r−

1) would

N −

be 2

(

1) as there are 2

rule terms. This summed up for the whole number

of rules leads to:

T PrismTCS = N− 1

M · N ·

(

N −

)

(

N − r

)=2

r =1

Which is equivalent to a complexity of

). Please note that this estimate

for the worst case is very pessimistic and unlikely to happen. In reality larger

datasets often contain many fewer rules than there are data instances [ 21 ]. This

is because Random Prism is a stable classifier due to the usage of R-PrimTCS as

base classifier and bagging. Also stated before in Sect. 2 , Random Prism employs

J-pruning which further reduces the number of rule terms per rule [ 19 ]. Hence,

in this case linearity can be exhibited as

(

· M

2 is reduced to

2 where

is the total

number of rules. An empirical study presented in [ 21 ] suggests that the number

of rules and rule terms induced does not increase linearly with the number of

instances. Also results in [ 19 ] suggest a more linear scalability of PrismTCS.

Assuming on average a linear complexity

(

N · M

) for PrismTCS, the com-

plexity with respect to

of Random Prism is a product of four factors.

The factors are PismTCS's complexity

and

(

N · M

), the average percentage of fea-

tures

considered by R-PrismTCS (this is a diminishing factor ranging between

0 and 1), the number of classifiers

(which is an increasing factor of a whole

number of at least 1 or higher) and a further diminishing factor

which reflects

the decrease of rules caused by having repeated instances in the training data

for each R-PrismTCS classifier. This leads to

. As pointed

out in Sect. 2 , one would intuitively expect the runtime of Random Prism to be

100 times longer, assuming 100 base classifiers are induced, compared with the

serial PrismTCS classifier. Yet the results in [ 20 ] show that the runtimes are

(

N · M

)

· f · b · d

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home