Database Reference
In-Depth Information
effect of marital status on churn. The marital status field includes four
different categories: single, married, divorced, and widowed. The algorithm,
based on specific criteria, concludes that only single customers present
a different behavior with respect to churn. Thus, it regroups the marital
status and if this field is selected for splitting, it will provide a binary split
which will separate single customers from the rest. This regrouping process
simplifies the understanding of the generated model and allows analysts to
focus on true discrepancies among groups that really differ with respect to
the output.
Decision treemodels discretize continuous predictors by collapsing their
values into ordered categories before evaluating them for possible splitting.
Hence the respective fields are transformed to ordinal categorical fields, that
is, fields with ordered categories. As an example let us review the handling of
the continuous field which represents the number of SMS messages in the
telecommunications cross-selling exercise presented above. A threshold of 84
SMS messages was identified and the respective split partitioned customers
into two groups: those with more and those with less than 84 SMS messages
per month.
Developing Stable and Understandable Decision Tree Models
In the case of decision tree models, ''less is more'' and the simplicity of
the generated rules is also a factor to consider besides predictive ability.
The number of tree levels should rarely be set above five or six. In cases
where decision trees are mainly applied for profiling and an explanation of
a particular outcome, this setting should be kept even lower (by requesting
three levels for instance) in order to provide a concise and readable rule set
that will enlighten the associations between the target and the inputs.
A crucial aspect to consider in the development of decision tree models
is their stability and their ability to capture general patterns, and not patterns
pertaining only to the particular training dataset. In general, the impurity
decreases as the tree size increases. Decision trees, if not restricted with
appropriate settings, may continue to grow until they reach perfect separation,
even if they end up with terminal nodes containing only a handful of records.
But will a rule founded on the behavior of two or three records work well on
new data? Most likely not, so it is crucial that data miners should also take
into account the support of each rule, that is, the rule's coverage or ''how
many cases constitute the rule.''
Maximum purity and the respective high confidence scores are not the
only things that data miners should consider. They should avoid models that
Search WWH ::




Custom Search