Database Reference
In-Depth Information
CLUSTER PROFILING AND SCORING
WITH SUPERVISED MODELS
Apart from tables and charts of descriptive statistics, the cluster profiling process
could also involve appropriate supervised models. Data miners can build classifi-
cation models such as decision trees, for instance, with the cluster membership
field as the target and all fields of interest as inputs, to gain more insight into the
revealed clusters. These models can jointly assess many attributes and reveal those
which best characterize each cluster. In the next section we will present a brief
introduction to decision trees. Because of their transparency and the intuitive form
of their results, decision trees are commonly applied for an understanding of the
structure of the revealed clusters.
Additionally, decision trees can also be used as a scoring model for allocating
new records to established clusters. Decision trees can translate the differentiating
characteristics of each cluster into a set of simple and understandable rules which
can subsequently be applied for classifying new records in the revealed clusters.
Although this approach also introduces a source of errors due to possibleweaknesses
in the derived decision tree model, it is also a more transparent approach for
cluster updating. It is based on understandable, model-driven rules, similar to
common business rules, which can more easily be examined and communicated.
Additionally, business users can more easily intervene and, if required, modify
these rules and fine tune them according to their business expertise.
For all the above reasons, the next sections are dedicated to decision trees
since they make up an excellent supplement to cluster analysis.
AN INTRODUCTION TO DECISION TREE MODELS
Decision trees belong to the class of supervised algorithms and are one of the
most popular classification techniques. A decision tree consists of a set of rules,
expressed in plain English, that associate a set of conditions to a specific outcome.
These rules can also be represented in an intuitive tree format, enabling the
visualization of the relationships between the predictors and the output.
Decision trees are often used for insight and for the profiling of target events/
attributes due to their transparency and the explanation of the predictions that
they provide. They perform a kind of ''supervised segmentation'': they recursively
partition (split) the overall population into ''pure'' subgroups, that is, homogeneous
Search WWH ::




Custom Search