Java Reference
In-Depth Information
9.4
Using Classification Interfaces
The javax.datamining.supervised.classification package contains classifi-
cation function interfaces, such as ClassificationSettings , Classification-
Model , ClassifiationApplySettings , ClassificationTestMetrics , etc. This section
illustrates the use of the classification interfaces and methods by extend-
ing the CustomerAttrition example in Listing 9-4, which illustrated a
simple classification model build using DME default settings. In this
section, we extend this code to illustrate advanced classification settings,
algorithm settings, model contents, and model evaluation. We also
provide code to apply the model to identify customers likely to attrite, in
line with the customer attrition problem discussed in Section 7.1.
9.4.1
Classification Settings
The ClassificationSettings interface allows us to specify outliers, prior
probabilities, a cost matrix, and various types of classification algo-
rithms. Table 9-10 lists the methods of the ClassificationSettings ,
SupervisedSettings , and BuildSettings interfaces. BuildSettings is the
base interface for all function level settings that provide common
methods across all mining functions. SupervisedSettings inherits from
BuildSettings to specify supervised function-specific settings, such as
the target attribute name. ClassificationSettings inherits from Super-
visedSettings to specify classification-specific settings.
Listing 9-8 illustrates the use of the classification settings methods
to specify outliers and a cost matrix. Lines 5 and 6 show the specifi-
cation of outliers for the capital gains attribute using the setOutlierI-
dentification and setOutlierTreatment methods of the BuildSettings . The
outlier identification is used to set the valid value range for capital
gains ($2,000 to $1,000,000). The outlier treatment option is used to
specify how algorithms must treat outliers; in this example, outliers
are treated as missing values. Lines 8 to 12 show the creation and set-
ting of the prior probabilities for the Attrite target attribute values;
Attriters are 20 percent and Non-attriters are 80 percent in the original
dataset. Lines 14 to 24 show the creation of the cost matrix discussed
in Section 7.1.4. The CostMatrixFactory.create method creates the
default cost matrix using a given CategorySet object, with a cost value
“1” for all nondiagonal cells and value “0” for all diagonal cells of
the matrix. Using the CostMatrix.setCellValue method, an application
can override the default cost values. In this example, as shown in
lines 20 and 21, cost value is set to $150 for a false negative and $50
for a false positive. Recall that the cost matrix can be reused across
Search WWH ::




Custom Search