Database Reference
In-Depth Information
usually produces small trees, but it does not always produce the
smallest possible tree.
(3) ID3 is designed for nominal attributes. Therefore, continuous data can
be used only after converting them to nominal bins.
Due to the above drawbacks, most of the practitioners prefer the C4.5
algorithm over ID3 mainly because C4.5 is an evolution of the ID3 algorithm
that tries to tackle its drawbacks.
7.3 C4.5
C4.5, an evolution of ID3, presented by the same author [ Quinlan (1993) ] ,
uses gain ratio as splitting criteria. The splitting ceases when the number
of instances to be split is below a certain threshold. Error-based pruning
is performed after the growing phase. C4.5 can handle numeric attributes.
It can also induce from a training set that incorporates missing values by
using corrected gain ratio criteria as described in Section 5.1.8.
C4.5 algorithm provides several improvements to ID3. The most
important improvements are:
(1) C4.5 uses a pruning procedure which removes branches that do not
contribute to the accuracy and replace them with leaf nodes.
(2) C4.5 allows attribute values to be missing (marked as ?).
(3) C4.5 handles continuous attributes by splitting the attribute's value
range into two subsets (binary split). Specifically, it searches for the
best threshold that maximizes the gain ratio criterion. All values above
the threshold constitute the first subset and all other values constitute
the second subset.
C5.0 is an updated, commercial version of C4.5 that offers a number
of improvements: it is claimed that C5.0 is much more ecient than C4.5
in terms of memory and computation time. In certain cases, it provides an
impressive speedup from hour and a half (that it took to C4.5 algorithm)
to only 3.5 seconds. Moreover, it supports the boosting procedure that can
improve predictive performance and is described in Section 9.4.1.
J48 is an open source Java implementation of the C4.5 algorithm in
the Weka data mining tool (see Section 10.2 for additional information).
Because J48 algorithm is merely a reimplementation of C4.5, it is expected
to perform similarly to C4.5. Nevertheless, a recent comparative study that
Search WWH ::




Custom Search