Databases Reference
In-Depth Information
# the other rows are going into the test set
testing <- setdiff ( 1 : n.points , training )
# define the test set to be the other rows
test <- subset ( data [ testing , ], select = c ( Age , Income ))
cl <- data $ Credit [ training ]
# this is the subset of labels for the training set
true.labels <- data $ Credit [ testing ]
# subset of labels for the test set, we're withholding these
Pick an evaluation metric
How do you evaluate whether your model did a good job?
This isn't easy or universal—you may decide you want to penalize cer‐
tain kinds of misclassification more than others. False negatives may
be way worse than false positives. Coming up with the evaluation
metric could be something you work on with a domain expert.
For example, if you were using a classification algorithm to predict
whether someone had cancer or not, you would want to minimize false
negatives (misdiagnosing someone as not having cancer when they
actually do), so you could work with a doctor to tune your evaluation
metric.
Note you want to be careful because if you really wanted to have no
false negatives, you could just tell everyone they have cancer. So it's a
trade-off between sensitivity and specificity , where sensitivity is here
defined as the probability of correctly diagnosing an ill patient as ill;
specificity is here defined as the probability of correctly diagnosing a
well patient as well.
Other Terms for Sensitivity and Specificity
Sensitivity is also called the true positive rate or recall and
varies based on what academic field you come from, but they
all mean the same thing. And specificity is also called the true
negative rate . There is also the false positive rate and the false
negative rate , and these don't get other special names.
Another evaluation metric you could use is precision , defined in
Chapter 5 . The fact that some of the same formulas have different
names is due to the fact that different academic disciplines have de‐
veloped these ideas separately. So precision and recall are the quantities
Search WWH ::




Custom Search