objectives and requirements followed by the definition of a data
mining problem, project planning, and an assessment of effort.
CRISP-DM distinguishes between business goals and data mining
goals. Business goals are stated in business terms, for example,
“reduce the cost of fraud in insurance claims.” Data mining goals are
stated in technical terms, for example, “determine which factors
(attributes) occur together on a claim form, combined with submitter
demographics, to identify fraudulent claims; then predict which
claims are fraudulent and order by predicted fraud monetary value.”
This first phase is the most important, and often the most
challenging. Without a clear understanding of what problem needs
to be solved and how results will be used, expectations may be
fuzzy and unrealistic. Business and technical people typically work
together to define the problem and how it can best be approached.
Some problems, such as campaign response modeling, can be easy
to define. Others, as noted for the “churn” problem discussed in
Chapter 2, require a deeper assessment of what is predicted—has
the customer churned?—and is not always immediately clear. For
example, does churn only occur when a customer has terminated
all service, some service, or merely reduced minutes of use? In
these cases, domain and business expertise is necessary to provide
Another aspect of the business understanding phase includes
identifying available resources: human, hardware, software, and
data. Knowing which domain and technical experts can be drawn
upon to work on the problem is an initial step. However, available
computing resources, appropriate software, and access to needed
data sources can make or break a data mining project and need to be
assessed early on.
Since most data mining projects expect a significant return on
investment (ROI), having such expectations defined up front is key.
Ensuring that costs are properly balanced with expected benefits
avoids false starts or inflated expectations. In large-scale projects, it is
not uncommon for data mining projects to result in cost savings or
increased profits of tens of millions of dollars with a small percentage
of that devoted to the data mining project itself. IDC, an analyst orga-
nization in the information technology and telecommunications
industries, notes that both predictive and nonpredictive analytics
projects yield high median ROI—with predictive analytics topping
out at 145 percent [IDC 2003]. IDC also notes that predictive analytics
projects dramatically improve business processes with an emphasis
on the quality of operational decisions.