Information Technology Reference
In-Depth Information
Alternatively, we could consider general volatility in class members'
predicted labels, beyond improvement in the model's ability to predict the class.
Again, using cross-validated predictions at successive epochs, it is possible
to isolate members of each class, and observe changes in the predicted class
for each instance. For example, when the predicted label of a given instance
changes between successive epochs, we can deem the instance to have been
redistricted [38-40]. Again considering the level of volatility in a model's
predictions to be a measurement of uncertainty, we can sample classes at epoch
t according to each classes' proportional measure of redistricting:
x c I (f t 1 (x) = f t 2 (x))
1
|
c
|
p t
R
(c)
x c I (f t 1 (x) = f t 2 (x)) ,
c
1
| c |
where I ( · ) is an indicator function taking the value of 1 if its argument is true
and 0 otherwise. f t 1 (x) and f t 2 (x) are the predicted labels, for instance, x
from the models trained at epoch t
1and t
2, respectively [38-40].
6.8.1.2 Expected Class Utility The previously described ACS heuristics are
reliant on the assumption that adding examples belonging to a particular class will
improve the predictive accuracy with respect to that class. This does not directly
estimate the utility of adding members of a particular class to a model's overall
performance. Instead, it may be preferable to select classes whose instances'
presence in the training set will reduce a model's misclassification cost by the
greatest amount in expectation.
Let cost (c i
c j ) be the cost of predicting c i on an instance x whose true label
is c j . Then the expected empirical misclassification cost over a sample dataset,
D ,is:
|
1
|D|
R =
P(c i | x) cost (c i | y),
x
∈D
i
where y is the correct class for a given x . Typically in the ACS setting, this
expectation would be taken over the training set (e.g.,
D = T ), preferably using
cross-validation. In order to reduce this risk, we would like to select examples
from class c , leading to the greatest reduction in this expected risk [39].
Consider a predictive model P T c ( ·| x) , a model built on the training set, T ,
supplemented with an arbitrary example belonging to class c . Given the oppor-
tunity to choose an additional class-representative example to the training pool,
we would like to select the class that reduces the expected risk by the greatest
amount:
c
=
argmax
c
U(c),
where
1
|D|
1
|D|
P T (c i | x) cost (c i | y)
P T c (c i | x) cost (c i | y).
U(c) =
x
∈D
i
x
∈D
i
Search WWH ::




Custom Search