Information Technology Reference
In-Depth Information
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
(a)
(b)
Figure 2.5 The impact of absolute rarity on classifier performance. (a) Full examples
and (b) half of those used in (a).
Having a small amount of training data will generally have a much larger
impact on the classification of the minority class (i.e., positive) examples. In
particular, it appears that about 90% of the space associated with the positive
class (in the solid rectangle) is covered by the learned classifier in Figure 2.5a,
while only about 70% of it is covered in Figure 2.5b. One paper summarized this
effect as follows: “A second reason why minority class examples are misclassified
more often than majority class examples is that fewer minority class examples
are likely to be sampled from the distribution D . Therefore, the training data are
less likely to include (enough) instances of all of the minority class subconcepts
in the concept space, and the learner may not have the opportunity to represent
all truly positive regions. Because of this, some minority class test examples will
be mistakenly classified as belonging to the majority class.” [4, p. 325].
Absolute rarity also applies to rare cases, which may not contain sufficiently
many training examples to be learned accurately. One study that used very simple
artificially generated datasets found that once the training set dropped below a
certain size, the error rate for the rare cases rose while the error rate for the
general cases remained at zero. This occurred because with the reduced amount
of training data, the common cases were still sampled sufficiently to be learned,
but some of the rare cases were missed entirely [7]. The same study showed,
more generally, that rare cases have a much higher misclassification rate than
common cases. We refer to this as the problem with rare cases . This research
also demonstrated something that had previously been assumed — that rare cases
cause small disjuncts in the learned classifier. The problem with small disjuncts ,
observed in many empirical studies, is that they (i.e., small disjuncts) generally
have a much higher error rate than large disjuncts [7 - 12]. This phenomenon is
again the result of a lack of data. The most thorough empirical study of small
disjuncts analyzed 30 real-world datasets and showed that, for the classifiers
induced from these datasets, the vast majority of errors are concentrated in the
smaller disjuncts [12].
Search WWH ::




Custom Search