Using the JDM API - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

9.4

Using Classification Interfaces

The javax.datamining.supervised.classification package contains classifi-

cation function interfaces, such as ClassificationSettings , Classification-

Model , ClassifiationApplySettings , ClassificationTestMetrics , etc. This section

illustrates the use of the classification interfaces and methods by extend-

ing the CustomerAttrition example in Listing 9-4, which illustrated a

simple classification model build using DME default settings. In this

section, we extend this code to illustrate advanced classification settings,

algorithm settings, model contents, and model evaluation. We also

provide code to apply the model to identify customers likely to attrite, in

line with the customer attrition problem discussed in Section 7.1.

9.4.1

Classification Settings

The ClassificationSettings interface allows us to specify outliers, prior

probabilities, a cost matrix, and various types of classification algo-

rithms. Table 9-10 lists the methods of the ClassificationSettings ,

SupervisedSettings , and BuildSettings interfaces. BuildSettings is the

base interface for all function level settings that provide common

methods across all mining functions. SupervisedSettings inherits from

BuildSettings to specify supervised function-specific settings, such as

the target attribute name. ClassificationSettings inherits from Super-

visedSettings to specify classification-specific settings.

Listing 9-8 illustrates the use of the classification settings methods

to specify outliers and a cost matrix. Lines 5 and 6 show the specifi-

cation of outliers for the capital gains attribute using the setOutlierI-

dentification and setOutlierTreatment methods of the BuildSettings . The

outlier identification is used to set the valid value range for capital

gains ($2,000 to $1,000,000). The outlier treatment option is used to

specify how algorithms must treat outliers; in this example, outliers

are treated as missing values. Lines 8 to 12 show the creation and set-

ting of the prior probabilities for the Attrite target attribute values;

Attriters are 20 percent and Non-attriters are 80 percent in the original

dataset. Lines 14 to 24 show the creation of the cost matrix discussed

in Section 7.1.4. The CostMatrixFactory.create method creates the

default cost matrix using a given CategorySet object, with a cost value

“1” for all nondiagonal cells and value “0” for all diagonal cells of

the matrix. Using the CostMatrix.setCellValue method, an application

can override the default cost values. In this example, as shown in

lines 20 and 21, cost value is set to $150 for a false negative and $50

for a false positive. Recall that the cost matrix can be reused across

Search WWH ::

Custom Search

Home