Information Technology Reference
In-Depth Information
by allowing the oracle to interact with the base learner, confusing instances,
those that “fool” the model can be sought out from the problem space and used
for subsequent training in the form of human-guided uncertainty sampling. This
interaction with the base learner can be extended a step further—by allowing the
humans to challenge the predictive accuracy of the problem space may potentially
reveal “problem areas,” portions of the example space where the base model per-
forms poorly that might not be revealed through traditional techniques such as
cross-validation studies [42].
Guided learning, along with alternative problem settings such as that faced by
the artificial nose discussed earlier deals with situations where an oracle is able
to provide “random” examples in arbitrary class proportions. It now becomes
interesting to consider just what this class proportion should be? This problem
appears to face the inverse of the difficulties faced by AL—labels essentially
come for free, while the independent feature values are completely unknown and
must be gathered at a cost. In this setting, it becomes important to consider the
question: “In what proportion should classes be represented in a training set of
a certain size?” [43].
Let us call the problem of proportioning class labels in a selection of n
additional training instances, “active class selection” (ACS) [38-40, 43]. This
process is exemplified in Figure 6.14. In this setting, large, class-conditioned
(virtual) pools of available instances with completely hidden feature values are
assumed. At each epoch, t , of the ACS process, the task is to leverage the cur-
rent model when selecting examples from these pools in a proportion believed
to have the greatest effectiveness for improving the generalization performance
Feature values
+
+
+
Training set
Model
+
+
+
Unexplored instances
Figure 6.14 Active class selection: gathering instances from random class-conditioned
fonts in a proportion believed to offer greatest improvement in generalization performance.
Search WWH ::




Custom Search