Database Reference
In-Depth Information
predictive performance. Analysts can specify options that control the extent and
the severity of the pruning.
The CHAID algorithm is a powerful and efficient decision tree technique
which also produces multiple splits and is based on the chi-square statistical test of
independence of two categorical fields. CHAID stands for Chi-square Automatic
Interaction Detector. In the CHAID model, the chi-square test is used to examine
whether the output and the evaluated predictor are independent. At each branch,
all predictors are evaluated for splitting according to this test. The most significant
predictor, that is, the predictor with the smallest p -value (observed significance
level) on the respective chi-square test, is selected for splitting, provided of course
that the respective p -value is below a specified threshold (significance level of the
test). Before evaluating predictors for splitting, the following actions take place:
1. Continuous predictors are discretized in bands of equal size, typically 10 groups
of 10% each, and recoded to categorical fields with ordered categories.
2. Predictors are regrouped and categories that do not differ with respect to the
outcome are merged. This regrouping of predictor categories is also based on
relevant chi-square tests of independence.
In all the above models (C&RT, C5.0, CHAID), analysts can specify in
advance the minimum number of records in the child nodes to ensure a minimum
support level for the resulting rules.
In the context of this topic we will only present the recommended options for
the CHAID algorithm.
Recommended CHAID Options
In Figures 3.29-3.31 and Table 3.14 the recommended CHAID options are
presented and explained for fine tuning the model in IBM SPSS Modeler and in
any other data mining software which offers the specific technique.
Table 3.14 IBM SPSS Modeler recommended CHAID options.
Option
Setting
Functionality/reasoning for selection
Method
CHAID
This option determines the tree growth
method. CHAID is the IBM SPSS
Modeler default option and the tree
growing method recommended to start
with in most classification tasks
( continued overleaf )
Search WWH ::




Custom Search