Database Reference
In-Depth Information
from the whole dataset is preserved in the training and test sets. Stratifi-
cation has been shown to help reduce the variance of the estimated error
especially for datasets with many classes.
Another cross-validation variation is the bootstraping method which
is a n -fold cross-validation, with n set to the number of initial samples. It
samples the training instances uniformly with replacement and leave-one-
out. In each iteration, the classifier is trained on the set of n
1samples
that is randomly selected from the set of initial samples, S .Thetestingis
performed using the remaining subset.
4.2.3
Alternatives to the Accuracy Measure
Accuracy is not a sucient measure for evaluating a model with an
imbalanced distribution of the class. There are cases where the estimation of
an accuracy rate may mislead one about the quality of a derived classifier. In
such circumstances, where the dataset contains significantly more majority
class than minority class instances, one can always select the majority
class and obtain good accuracy performance. Therefore, in these cases, the
sensitivity and specificity measures can be used as an alternative to the
accuracy measures [Han and Kamber (2001)].
Sensitivity (also known as recall) assesses how well the classifier can
recognize positive samples and is defined as
Sensitivity = true positive
positive
,
(4.2)
where true positive corresponds to the number of the true positive samples
and positive is the number of positive samples.
Specificity measures how well the classifier can recognize negative
samples. It is defined as
true negative
negative
Specificity =
,
(4.3)
where true negative corresponds to the number of the true negative exam-
ples and negative the number of samples that is negative.
Another well-known performance measure is precision. Precision mea-
sures how many examples classified as “positive” class are indeed “positive”.
This measure is useful for evaluating crisp classifiers that are used to classify
an entire dataset. Formally:
true positive
true positive + false positive .
P recision =
(4.4)
Search WWH ::




Custom Search