Database Reference
In-Depth Information
classifiers [ Banfield et al . (2007) ] . Statistical tests were performed on
experimental results from 57 publicly available datasets. When cross-
validation comparisons were tested for statistical significance, the best
method was statistically more accurate than bagging on only eight of the
57 datasets. Alternatively, examining the average ranks of the algorithms
across the group of datasets, Banfield found that boosting, random forests,
and randomized trees is statistically significantly better than bagging.
9.10 Open Source for Decision Trees Forests
There are two open source software packages which can be used for creating
decision trees forests. Both systems, which are free, are distributed under
the terms of the GNU General Public License.
The OpenDT [ Banfield (2005) ] package has the ability to output
trees very similar to C4.5, but has added functionality for ensemble
creation. In the event that the attribute set randomly chosen provides a
negative information gain, the OpenDT approach is to randomly rechoose
attributes until a positive information gain is obtained, or no further split
is possible. This enables each test to improve the purity of the resultant
leaves. The system is written in Java.
The Weka package [ Frank et al. (2005) ] is an organized collection of state-
of-the-art machine learning algorithms and data preprocessing tools. The
basic way of interacting with these methods is by invoking them from the
command line. However, convenient interactive graphical user interfaces
are provided for data exploration, for setting up large-scale experiments
on distributed computing platforms, and for designing configurations
for streamed data processing. These interfaces constitute an advanced
environment for experimental data mining. Weka includes many decision
tree learners: decision stumps, ID3, a C4.5 clone called “J48”, trees
generated by reduced error pruning, alternating decision trees, and
random trees and forests thereof, including random forests, bagging,
boosting, and stacking.
Search WWH ::




Custom Search