Clustering models contain descriptions of groups of records that
share similar characteristics, such as the naturally occurring seg-
ments in a customer database. Attribute importance models rank the
input attributes according to how well they are able to predict an out-
come or assist in defining clusters. Association models contain rules
for common co-occurrences in data, such as the determination that
customers who purchased products A and B also purchased product
C 90 percent of the time.
Data mining often is characterized as being predictive or descriptive
and supervised or unsupervised . The predictive nature of data mining
is that the models produced from historical data have the ability to
predict outcomes such as which customers are likely to churn, who
will be interested in a particular product, or which medications are
likely to affect the outcome of cancer treatment positively.
The descriptive nature of data mining is where the model itself is
inspected to understand the essence of the knowledge or patterns
found in the data. As in the regression example of Section 1.2.4, we
may be more interested in the trend of the data and hence knowing
that the slope of the line is positive is sufficient—as age increases,
Some models serve both predictive and descriptive purposes. For
example, a decision tree not only can predict outcomes, but also can
provide human interpretable rules that explain why a prediction was
made. Clustering models provide not only the ability to assign a
record to a cluster, but also a description of each cluster, either in the
form of a representative point called a centroid , or as a rule that
describes why a record is considered part of the cluster. These
concepts are explained more fully in Chapter 7.
The notion of model transparency is the ability of a user to under-
stand how or why a given model makes certain predictions. Some
algorithms produce such models, others algorithms produce models
that are treated as “black boxes.” Neural networks are a good example
of opaque models that are used solely for their predictive capabilities.
A second characterization is supervised and unsupervised learning.
Supervised learning simply means that the algorithm requires that
its source data contain the correct answer for each record. This allows
some algorithms (e.g., decision trees and neural networks) to make
corrections to a model to ensure that it can get as many of the
answers correct as possible. The correct answers supervise the learn-
ing process by pointing out mistakes when the algorithm uses the
model to predict outcomes. Other algorithms (e.g., naïve bayes) use