likely, or unlikely, to attrite. Using the data mining classification
function, ABCBank can predict customers who are likely to attrite
and understand the characteristics, or profiles , of such customers.
Gaining a better understanding of customer behavior enables
ABCBank to develop business plans to retain customers.
Classification is used to assign cases, such as customers, to discrete
values, called classes or categories , of the target attribute. The target is
the attribute whose values are predicted using data mining. In this
problem, the target is the attribute attrite with two possible values:
Attriter and Non-attriter . When referring to the model build dataset,
the value Attriter indicates that the customer closed all accounts, and
Non-attriter indicates the customer has at least one account at
ABCBank. When referring to the prediction in the model apply
dataset, the value Attriter indicates that the customer is likely to
attrite and Non-attriter indicates that the customer is not likely to
attrite. The prediction is often associated with a probability indicat-
ing how likely the customer is to attrite. When a target attribute has
only two possible values, the problem is referred to as a binary classi-
fication problem. When a target attribute has more than two possible
values, the problem is known as a multiclass classification problem.
Data Specification: CUSTOMERS Dataset
As noted in Chapter 3, an important step in any data mining project
is to collect related data from enterprise data sources. Identifying
which attributes should be used for data mining is one of the chal-
lenges faced by the data miner and relies on appropriate domain
knowledge of the data. In this example, we introduce a subset of pos-
sible customer attributes as listed in Table 7-1. In real-world scenar-
ios, there may be hundreds or even thousands of customer attributes
available in enterprise databases.
Table 7-1 lists physical attribute details of the CUSTOMERS dataset,
which include name , data type , and description . The attribute name
refers to either a column name of a database table or a field name of a
flat file. The attribute data type refers to the allowed type of values
for that attribute. JDM defines integer , double , and string data types,
which are commonly used data types for mining. JDM conformance
rules allow a vendor to add more data types if required. Attribute
description can be used to explain the meaning of the attribute or
describe the allowed values. In general, physical data characteristics
are captured by database metadata.