Information Technology Reference
In-Depth Information
probability estimates should be properly calibrated to be aligned with the test data,
if possible [43, 49, 50].
Prior work has repeatedly demonstrated the benefits of performing ACS
beyond simply selecting random examples from an example pool for acquisition
or simply using uniformly balanced selection. However, in many cases, simply
casting what would typically be an AL problem into an ACS problem, and
selecting examples uniformly among the classes can provide results far better
than what would be possible with AL alone. For instance, the learning curves
presented in Figure 6.12 compare such uniform guided learning with AL and
simple random selection. Providing the model with an essentially random but
class-balanced training set far exceeds the generalization performance possible
for an AL strategy or by random selection once the class skew becomes
substantial. More intelligent ACS strategies may make this difference even more
pronounced, and should be considered if the development effort associated with
incorporating such strategies would be outweighed by the savings coming from
reduced data acquisition costs.
6.8.2 Feature-Based Learning and Active Dual Supervision
While traditional supervised learning is by far the most prevalent classification
paradigm encountered in the research literature, it is not the only approach for
incorporating human knowledge into a predictive system. By leveraging, for
instance, class associations with certain feature values, predictive systems can be
trained that offer potentially excellent generalization performance without requir-
ing the assignment of class labels to individual instances. Consider the example
domain of predicting the sentiment of movie reviews. In this context, it is clear
that the presence of words such as “amazing” and “thrilling” carries an associ-
ation with the positive class, while terms such as “boring” and “disappointing”
evoke negative sentiment [51]. Gathering this kind of annotation leverages an
oracle's prior experience with the class polarity of certain feature values—in
this case, the emotion that certain terms tend to evoke. The systematic selec-
tion of feature values for labeling by a machine learning system is referred to
as active feature-value labeling , 11 . The general setting where class associations
are actively sought for both feature values and particular examples is known
as ADSs . The process of selection for AFL and ADS is shown in Figures 6.15
and 6.16, respectively.
Of course, incorporating the class polarities associated with certain feature
values typically requires specialized models whose functional form has been
designed to leverage feature-based background knowledge. While a survey of
models for incorporating such feature- value/class polarities is beyond the scope
of this chapter, an interested reader is advised to seek any number of related
papers (cf. [52-58]). However, while sophisticated models of this type have
11 For brevity, this is often shortened as AFL, a moniker that is best suited for domains with binary
features.
Search WWH ::




Custom Search