Information Technology Reference
In-Depth Information
Previously, we have discussed a plethora of AL techniques specifically tuned
for the high skew setting [18-21] as well as techniques where the geometry
and feature density of the problem space are explicitly included when making
instance selections [13-15, 17, 35, 36]. These techniques, as initially appealing
as they may seem, may fail just as badly as traditional AL techniques. Class
skew and subconcept rarity discussed in Section 6.6 may be sufficient to thwart
them completely [33, 41].
However, in many of these extremely difficult settings, we can task humans to
search the problem space for rare cases, using tools (such as search engines) and
possibly interacting with the base learner. Consider the motivating example of
hate speech classification on the web (from above). While an active learner may
experience difficulty in exploring the details of this rare class, a human oracle
armed with a search interface is likely to expose examples of hate speech quite
easily. In fact, given the coverage of modern web search engines, a human can
produce interesting examples from a much larger sample of the problem space
far beyond that which is likely to be contained in a sample pool for AL. This
is critical due to hardware-imposed constraints on the size of the pool that an
active learner is able to choose from—for example, a random draw of several
hundred thousand examples from the problem space may not even contain any
members of the minority class or of rare disjuncts!
Guided learning is the general process of utilizing oracles to search the problem
space, using their domain expertise to seek instances representing the interesting
regions of the problem space. Figure 6.13 presents the general guided learning
setting. Here, given some interface enabling the search over the domain in ques-
tion, an oracle searches for interesting examples, which are either supplemented
with an implicit label by the oracle, or sent for explicit labeling as a second
step. These examples are then added to the training set and a model is retrained.
Oracles can leverage their background knowledge of the problem being faced. In
addition to simply being charged with the acquisition of class-specific examples,
Feature values
+
Oracle
Instance
space
+
Seek out
useful instances
+
+
New instance
Training set
Figure 6.13 Guided learning: an oracle selecting useful examples from the instance
space.
Search WWH ::




Custom Search