Information Technology Reference
In-Depth Information
Given the option of incorporating prior knowledge associated with certain fea-
ture values into the predictive behavior of a model, the question then becomes:
which feature examples should be selected for labeling? Initially, this may seem
to be a deflection, replacing the search for useful information of one kind with
another. However, there are several reasons why seeking feature values to “label”
may be preferred to labeling examples. The most obvious reason for selecting
feature values for labeling is that traditional is often “slow.” It may take many
labeled movie reviews to teach the model that “amazing” has a positive associa-
tion, and not the other uninformative terms that just happen to occur in positive
reviews by coincidence. In this way, the cold start problem faced in AL 12 may
be less acute in AFL and ADS—while complex feature/class relationships may
be difficult to achieve using AFL; reasonable generalization performance is often
achievable with few requests to an oracle. Second, the labor costs associated with
assigning a class polarity to a certain feature value is often quite low—it is easy
for a human to associate the term terrible with negative movie reviews, while
labeling one particular movie review as positive or negative requires reading the
entire document. Note that of course not every term is polar; in the ongoing
example of movie reviews, it is easy to imagine that a few terms have a natural
association with the positive or negative class, while most terms on their own
do not have such a polarity. However, this imbalance between polar and non-
polar feature values is often far less acute than the imbalance between classes
in many machine learning problems. A problem domain with a class imbalance
of 1 , 000 , 000 : 1 may still have 1 in 10 features exhibiting some meaningful
class association. Perhaps even more importantly, the ratio between positively
and negatively linked feature values (for instance) may be far more balanced
than the ratio between those classes in the wild. In fact, there is not necessarily a
relationship between the base rate and the ratio of strongly identifiable positively
and negatively associated feature values. While selecting useful feature values in
practice is still often a challenging problem, experience has shown that it is often
more informative to select random features for labeling than random examples.
More intelligent selection heuristics can make this preference for AFL and ADS
even stronger.
Again, giving a thorough survey of selection heuristics for performing AFL
is beyond the scope of this chapter. However, we will provide the reader with
a brief overview of the techniques typically employed for such tasks. As in
many selective data acquisition tasks for machine learning, we see two common
themes: uncertainty-based selection and expected-utility-based approaches. In the
following, we will briefly present some more popular techniques for AFL delin-
eated accordingly. We will then briefly discuss the techniques for ADS, selecting
features and examples for labeling simultaneously.
6.8.2.1 Uncertainty-Based AFL By far the most prevalent class of AL heuris-
tics, uncertainty-based approaches do not have a direct analogue to the selection
12 Recall Section 6.7
Search WWH ::




Custom Search