Information Technology Reference
In-Depth Information
of this model. The feature values for each instance are then collected and the
complete instances are added to the training set. The model is reconstructed and
the processes is repeated until n examples are obtained (because the budget is
exhausted or some other stopping criterion is met, such as a computational limit).
Note that this situation can be considered to be a special case of the instance
completion setting of active feature-value acquisition (cf. [44]). It is a degenerate
special case because, before the selection, there is no information at all about the
instances other than their classes.
For the specific problem at the heart of ACS, the extreme lack of infor-
mation to guide selection leads to the development of unique uncertainty and
utility estimators, which, in the absence of predictive covariates, require unique
approximations. 10 While alternative approaches to ACS have emerged, for the-
matic clarity, uncertainty-based and expected-utility-based approaches will be
presented first. Note that because effective classification requires that both sides
of a prediction boundary be represented, unlike typical AL techniques, ACS
typically samples classes from their respective score distributions [45, 46].
6.8.1.1 Uncertainty-Based Approaches This family of techniques for
performing ACS is based on the volatility in the predictions made about certain
classes—those classes whose cross-validated predictions are subject to the most
change between successive epochs of instance selection are likely to be based
on an uncertain predictor and amenable to refinement by the incorporation of
additional training data [38, 40]. Analogous to the case of more traditional
uncertainty-based data acquisition, several heuristics have been devised to
capture the notion of variability.
One measure of the uncertainty of a learned model is how volatile its predictive
performance is in the face of new training data. Take a typical learning curve, for
instance, those presented in Figure 6.6. Notice that the modeling is much more
volatile at the left side of the figure, showing large changes in generalization
performance for the same amount of new training data. We can think that as
the predictor gains knowledge of the problem space, it tends to solidify in the
face of data, exhibiting less change and greater certainty. For ACS, we might
wonder if the learning curves will be equally steep regardless of the class of
the training data [38-40]. With this in mind, we can select instances at epoch
t from the classes in proportion to their improvements in accuracy at t 1and
t 2. For example, we could use cross-validation to estimate the generalization
performance of the classifier with respect to each class, A (c) ;class c can then
be sampled according to:
max 0 ,
2 (c)
t
1 (c)
t
A
A
p t
A
(c)
c max 0 , A
2 (c ) ,
t
1 (c ) A
t
10 In realistic settings, for instance, such as potential application for ACS, guided learning, this lack
of information assumption may be softened.
Search WWH ::




Custom Search