Information Technology Reference
In-Depth Information
it is possible to estimate this quantity directly. Let q enumerate over all possible
feature values that may be queried for labels. We can estimate the expected utility
of such a query by: EU(q) = k = 1 P(q = c k ) U (q = c k )/ω q ,where P(q = c k )
is the probability of the instance or feature queried being associated with class c k ,
ω q is the cost of query q ,and U is some measure of the utility of q . 13 This results
in the decision-theoretic optimal policy, which is to ask for feature labels which,
once incorporated into the data, will result in the highest increase in classification
performance in expectation [51, 63].
6.8.2.3 Active Dual Supervision ADS is concerned with situations where it is
possible to query an oracle for labels associated with both feature values and
examples. Even though such a paradigm is concerned with the simultaneous
acquisition of feature and example labels, the simplest approach is treating each
acquisition problem separately and then mixing the selections somehow. Active
interleaving performs a separate (un) certainty-based ordering on features and
on examples, and chooses selections from the top of each ordering according to
some predefined proportion. The different nature of feature value and example
uncertainty values lead to incompatible quantities existing on different scales,
preventing a single, unified ordering. However, expected utility can be used to
compute a single unified metric, encapsulating the value of both types of data
acquisition. As mentioned earlier, we are estimating the utility of a certain feature
of example query q as: EU(q)
= k = 1 P(q
c k )/ω q . Using a single
utility function for both features and examples and incorporating label acquisition
costs, costs and benefits of the different types of acquisition can be optimized
directly [51].
=
c k )
U
(q
=
6.9 CONCLUSION
This chapter presents a broad perspective on the relationship between AL—the
selective acquisition of labeled examples for training statistical models—and
imbalanced data classification tasks where at least one of the classes in the train-
ing set is represented with much fewer instances than the other classes. Our
comprehensive analysis of this relationship leads to the identification of two com-
mon associations, namely (i) is the ability of AL to deal with the data imbalance
problem that, when manifested in a training set, typically degrades the general-
ization performance of an induced model, and (ii) is the impact class imbalance
may have on the abilities of an otherwise reasonable AL scheme to select infor-
mative examples, a phenomenon that is particularly acute as the imbalance tends
toward the extreme.
Mitigating the impact of class imbalance on the generalization performance
of a predictive model, in Sections 6.3 and 6.4, we present AL as an alternative
to more conventional resampling strategies. An AL strategy may select a dataset
13 For instance, cross-validated accuracy or log-gain may be used.
Search WWH ::




Custom Search