Java Data Mining Concepts - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

In this example, we are more interested to know about the

customers who are likely to attrite, so the Attriter value is considered

the positive target value —the value we are interested in predicting. As

we will see in Section 7.1.6, the positive target value is necessary

when computing lift and the ROC test metric. The Non-attriter value

is considered the negative target value. This allows us to use the

terminology false positive and false negative . A false positive ( FP ) occurs

when a case is known to have the negative target value, but the

model predicts the positive target value. A false negative ( FN ) occurs

when a case is known to have a positive target value, but the model

predicts the negative target value. The true positives are the cases

where the predicted and actual positive target values are in

agreement, and true negatives are the cases where the predicted and

actual negative target values are in agreement. In Figure 7-2 note that

the false negative cost is $150 and the false positive is $50 and all

diagonal elements always have cost “O,” because there is no cost for

correct predictions.

7.1.5

Select algorithm: Find the Best Fit Algorithm

Since JDM defines algorithm selection as an optional step, most

data mining tools provide a default or preselected algorithm for

each mining function. Some data mining tools automate finding the

most appropriate algorithm and its settings based on the data and

user-specified problem characteristics. If the data miner does not

specify the algorithm to be used, the JDM implementation chooses

the algorithm.

If the JDM implementation does not select the algorithm automat-

ically, or the data miner wants control over the algorithm settings,

the user can explicitly select the algorithm and specify its settings.

Selection of the right algorithm and settings benefits from data min-

ing expertise, knowledge of the available algorithms, and often

experimentation to determine which algorithm best fits the problem.

Data miners will often try different algorithms and settings, and

inspect the resulting models and test results to select the best algo-

rithm and settings. This section provides a high-level overview of the

algorithms supported by JDM for classification problems: decision

tree, naïve bayes ( NB ) , support vector machine ( SVM ) , and feed forward

neural networks . For more detailed descriptions of these algorithms,

refer to [Jiawei

2001] [Witten/Frank 2005].

Search WWH ::

Custom Search

Home