Java Reference
In-Depth Information
In this example, we are more interested to know about the
customers who are likely to attrite, so the Attriter value is considered
the positive target value —the value we are interested in predicting. As
we will see in Section 7.1.6, the positive target value is necessary
when computing lift and the ROC test metric. The Non-attriter value
is considered the negative target value. This allows us to use the
terminology false positive and false negative . A false positive ( FP ) occurs
when a case is known to have the negative target value, but the
model predicts the positive target value. A false negative ( FN ) occurs
when a case is known to have a positive target value, but the model
predicts the negative target value. The true positives are the cases
where the predicted and actual positive target values are in
agreement, and true negatives are the cases where the predicted and
actual negative target values are in agreement. In Figure 7-2 note that
the false negative cost is $150 and the false positive is $50 and all
diagonal elements always have cost “O,” because there is no cost for
correct predictions.
7.1.5
Select algorithm: Find the Best Fit Algorithm
Since JDM defines algorithm selection as an optional step, most
data mining tools provide a default or preselected algorithm for
each mining function. Some data mining tools automate finding the
most appropriate algorithm and its settings based on the data and
user-specified problem characteristics. If the data miner does not
specify the algorithm to be used, the JDM implementation chooses
the algorithm.
If the JDM implementation does not select the algorithm automat-
ically, or the data miner wants control over the algorithm settings,
the user can explicitly select the algorithm and specify its settings.
Selection of the right algorithm and settings benefits from data min-
ing expertise, knowledge of the available algorithms, and often
experimentation to determine which algorithm best fits the problem.
Data miners will often try different algorithms and settings, and
inspect the resulting models and test results to select the best algo-
rithm and settings. This section provides a high-level overview of the
algorithms supported by JDM for classification problems: decision
tree, naïve bayes ( NB ) , support vector machine ( SVM ) , and feed forward
neural networks . For more detailed descriptions of these algorithms,
refer to [Jiawei
2001] [Witten/Frank 2005].
Search WWH ::




Custom Search