FOUNDATIONS OF IMBALANCED LEARNING - Imbalanced Learning: Foundations, Algorithms, and Applications

Information Technology Reference

In-Depth Information

−

(a)

(b)

Figure 2.5 The impact of absolute rarity on classifier performance. (a) Full examples

and (b) half of those used in (a).

Having a small amount of training data will generally have a much larger

impact on the classification of the minority class (i.e., positive) examples. In

particular, it appears that about 90% of the space associated with the positive

class (in the solid rectangle) is covered by the learned classifier in Figure 2.5a,

while only about 70% of it is covered in Figure 2.5b. One paper summarized this

effect as follows: “A second reason why minority class examples are misclassified

more often than majority class examples is that fewer minority class examples

are likely to be sampled from the distribution D . Therefore, the training data are

less likely to include (enough) instances of all of the minority class subconcepts

in the concept space, and the learner may not have the opportunity to represent

all truly positive regions. Because of this, some minority class test examples will

be mistakenly classified as belonging to the majority class.” [4, p. 325].

Absolute rarity also applies to rare cases, which may not contain sufficiently

many training examples to be learned accurately. One study that used very simple

artificially generated datasets found that once the training set dropped below a

certain size, the error rate for the rare cases rose while the error rate for the

general cases remained at zero. This occurred because with the reduced amount

of training data, the common cases were still sampled sufficiently to be learned,

but some of the rare cases were missed entirely [7]. The same study showed,

more generally, that rare cases have a much higher misclassification rate than

common cases. We refer to this as the problem with rare cases . This research

also demonstrated something that had previously been assumed — that rare cases

cause small disjuncts in the learned classifier. The problem with small disjuncts ,

observed in many empirical studies, is that they (i.e., small disjuncts) generally

have a much higher error rate than large disjuncts [7 - 12]. This phenomenon is

again the result of a lack of data. The most thorough empirical study of small

disjuncts analyzed 30 real-world datasets and showed that, for the classifiers

induced from these datasets, the vast majority of errors are concentrated in the

smaller disjuncts [12].

Search WWH ::

Custom Search

Home