Database Reference
In-Depth Information
Fig. 4. Size up behaviour of Parallel Random Prism with respect to the number of
training instances. Headings Test 1 and Test 2 refer to the test datasets in Table 1 .
These datasets have in this case been appended to themselves in order to increase the
number of training instances, while keeping the concept stable.
The first set of size-up experiments looks on the algorithm's performance
with respect to the number of data instances. For each dataset an initial sample
of 10000 instances has been taken. Then this sample has been appended to itself
in a vertical direction as explained above. The runtime for different sizes of data
has been recorded and is plotted in Fig. 4 versus the data size. Please note that
an initial sample of 10000 instances may seem small. However, considering the
usage of 100 base classifiers would increase the sample in the memory so that the
Parallel Random Prism system has in fact to deal with 1000000 data instances
for a 10000 instance input sample.
In general we can observe a nice size-up that is close to being linear with
respect to the number of training instances. These results clearly support the
theoretical average linear behaviour.
The second set of size-up experiments looks at the algorithm's performance
with respect to the number of features. The data has been appended to itself in
a horizontal direction as explained earlier in this section. Again, the number of
training instances is increasing by factor 100 due to the use of 100 base classifiers.
The runtime for different sizes of data has been recorded and is plotted in Fig. 5
versus the data size.
Note that for this set of size-up experiments there is no setup with only one
cluster node. The reason for this is that we used the original number of data
features for both datasets, which simply exceeds the computational capabilities
of one cluster node after the bagging procedure for 100 base classifiers.
In general we can observe a nice size-up that is close to being linear with
respect to the number of features. These results clearly support the theoretical
average linear behaviour.
The speed-up factors recorded for Parallel Random Prism, on both test
datasets and for different numbers of cluster nodes (up to the 10 available) are
Search WWH ::




Custom Search