Data Mining Techniques for Segmentation - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

predictive performance. Analysts can specify options that control the extent and

the severity of the pruning.

The CHAID algorithm is a powerful and efficient decision tree technique

which also produces multiple splits and is based on the chi-square statistical test of

independence of two categorical fields. CHAID stands for Chi-square Automatic

Interaction Detector. In the CHAID model, the chi-square test is used to examine

whether the output and the evaluated predictor are independent. At each branch,

all predictors are evaluated for splitting according to this test. The most significant

predictor, that is, the predictor with the smallest p -value (observed significance

level) on the respective chi-square test, is selected for splitting, provided of course

that the respective p -value is below a specified threshold (significance level of the

test). Before evaluating predictors for splitting, the following actions take place:

1. Continuous predictors are discretized in bands of equal size, typically 10 groups

of 10% each, and recoded to categorical fields with ordered categories.

2. Predictors are regrouped and categories that do not differ with respect to the

outcome are merged. This regrouping of predictor categories is also based on

relevant chi-square tests of independence.

In all the above models (C&RT, C5.0, CHAID), analysts can specify in

advance the minimum number of records in the child nodes to ensure a minimum

support level for the resulting rules.

In the context of this topic we will only present the recommended options for

the CHAID algorithm.

Recommended CHAID Options

In Figures 3.29-3.31 and Table 3.14 the recommended CHAID options are

presented and explained for fine tuning the model in IBM SPSS Modeler and in

any other data mining software which offers the specific technique.

Table 3.14 IBM SPSS Modeler recommended CHAID options.

Option

Setting

Functionality/reasoning for selection

Method

CHAID

This option determines the tree growth

method. CHAID is the IBM SPSS

Modeler default option and the tree

growing method recommended to start

with in most classification tasks

( continued overleaf )

Search WWH ::

Custom Search

Home