Database Reference
In-Depth Information
Table 3.14 ( continued )
Option
Setting
Functionality/reasoning for selection
Worth trying alternatives: for example,
exhaustive CHAID, a modification of the
CHAID algorithm that takes longer to
be trained but often gives high-quality
results. Users should also try other
decision tree algorithms and C5.0 in
particular
Maximum tree
depth/levels
below root
3-6
This option determines the maximum
allowable number of consecutive parti-
tions of the data. Although this option
is also related to the available number
of records, users should try to achieve
effective results without ending up with
bushy and complicated trees
For trees mainly constructed for profiling
purposes, a depth of three to four levels
is typically adequate
For predictive purposes, a depth of five
to six levels is sufficient, whereas larger
trees, even when the available number
of records allows them, would probably
provide complicated rules that are hard
to examine and evaluate
Alpha for splitting
0.05
This option determines the significance level
for the chi-square statistical test used for
splitting. A split is performed only if
a significant predictor can be found,
that is, the p -value of the corresponding
statistical test is less than the specified
alpha for splitting
In plain language, this means that by
increasing the alpha for the splitting
value (normally up to 0.10), the test for
splitting becomes less strict and splitting
is made easier, resulting in potentially
larger trees. A tree previously terminated
due to non-significant predictors may be
further grown because of loosening in the
test's criteria. Lower values (normally up
to 0.01) tend to give smaller trees
Search WWH ::




Custom Search