Database Reference
In-Depth Information
case is dicult to estimate, as the number of iterations of PrismTCS is dependent
on the number of rules and rule terms induced, which in turn are dependent on
the concept encoded in the training data. However, it is possible to estimate
the worst case, assuming that
the number
of features in the training data. Furthermore a categorical feature will occur at
most in only one term per rule, whereas a continuous feature may occur in two
terms per rule, as two rule terms can describe any value interval in a continuous
feature. Thus in the worst case all features are continuous and all rules have
2 ˙
N
is the number of instances and
M
terms. Also in the worst case each (with the exception of 1) instance is
encoded in a separate rule which will lead to
M
1is
because if there is only one instance left in step 6 of the PrismTCS pseudocode,
then there is no need to generate a further rule for it. The complexity (number
of cutpoint calculations) of inducing the
N −
1 rules in total. The
r th rule is 2
M
(
N − r
). The factor
(
) is the number of training instances not covered by the rules induced
so far, as mentioned above, in the worst case each rule covers only one training
instance. These uncovered instances are used for the induction of the next rule.
For example, the number of cutpoint calculations for a term of the first rule
(
N − r
r
= 1), where the training data is still of size
N
, would be 2
M
(
N −
1). The
total number of cutpoint calculations for the whole rule in this case (
r−
1) would
M
N −
M
be 2
(
1) as there are 2
rule terms. This summed up for the whole number
of rules leads to:
T PrismTCS = N− 1
M · N ·
(
N −
1)
(2
M
)
·
(
N − r
)=2
2
r =1
Which is equivalent to a complexity of
). Please note that this estimate
for the worst case is very pessimistic and unlikely to happen. In reality larger
datasets often contain many fewer rules than there are data instances [ 21 ]. This
is because Random Prism is a stable classifier due to the usage of R-PrimTCS as
base classifier and bagging. Also stated before in Sect. 2 , Random Prism employs
J-pruning which further reduces the number of rule terms per rule [ 19 ]. Hence,
in this case linearity can be exhibited as
O
(
N
2
· M
2 is reduced to
2 where
is the total
number of rules. An empirical study presented in [ 21 ] suggests that the number
of rules and rule terms induced does not increase linearly with the number of
instances. Also results in [ 19 ] suggest a more linear scalability of PrismTCS.
Assuming on average a linear complexity
N
R
R
O
(
N · M
) for PrismTCS, the com-
plexity with respect to
of Random Prism is a product of four factors.
The factors are PismTCS's complexity
N
and
M
O
(
N · M
), the average percentage of fea-
tures
considered by R-PrismTCS (this is a diminishing factor ranging between
0 and 1), the number of classifiers
f
(which is an increasing factor of a whole
number of at least 1 or higher) and a further diminishing factor
b
which reflects
the decrease of rules caused by having repeated instances in the training data
for each R-PrismTCS classifier. This leads to
d
. As pointed
out in Sect. 2 , one would intuitively expect the runtime of Random Prism to be
100 times longer, assuming 100 base classifiers are induced, compared with the
serial PrismTCS classifier. Yet the results in [ 20 ] show that the runtimes are
O
(
N · M
)
· f · b · d
Search WWH ::




Custom Search