Databases Reference
In-Depth Information
of features should be sufficient to train a good classifier. Suppose we split the feature
set into two sets and train two classifiers, f 1 and f 2 , where each classifier is trained on a
different set. Then, f 1 and f 2 are used to predict the class labels for the unlabeled data,
X u . Each classifier then teaches the other in that the tuple having the most confident
prediction from f 1 is added to the set of labeled data for f 2 (along with its label).
Similarly, the tuple having the most confident prediction from f 2 is added to the set of
labeled data for f 1 . The method is summarized in Figure 9.17. Cotraining is less sensitive
to errors than self-training. A difficulty is that the assumptions for its usage may not
hold true, that is, it may not be possible to split the features into mutually exclusive and
class-conditionally independent sets.
Alternate approaches to semi-supervised learning exist. For example, we can model
the joint probability distribution of the features and the labels. For the unlabeled data,
the labels can then be treated as missing data. The EM algorithm (Chapter 11) can be
used to maximize the likelihood of the model. Methods using support vector machines
have also been proposed.
9.7.3 Active Learning
Active learning is an iterative type of supervised learning that is suitable for situations
where data are abundant, yet the class labels are scarce or expensive to obtain. The learn-
ing algorithm is active in that it can purposefully query a user (e.g., a human oracle) for
labels. The number of tuples used to learn a concept this way is often much smaller than
the number required in typical supervised learning.
How does active learning work to overcome the labeling bottleneck? ” To keep costs
down, the active learner aims to achieve high accuracy using as few labeled instances
as possible. Let D be all of data under consideration. Various strategies exist for active
learning on D . Figure 9.18 illustrates a pool-based approach to active learning. Suppose
that a small subset of D is class-labeled. This set is denoted L . U is the set of unlabeled
data in D . It is also referred to as a pool of unlabeled data. An active learner begins with
L as the initial training set. It then uses a querying function to carefully select one or
more data samples from U and requests labels for them from an oracle (e.g., a human
annotator). The newly labeled samples are added to L , which the learner then uses in
a standard supervised way. The process repeats. The active learning goal is to achieve
high accuracy using as few labeled tuples as possible. Active learning algorithms are
typically evaluated with the use of learning curves, which plot accuracy as a function of
the number of instances queried.
Most of the active learning research focuses on how to choose the data tuples to
be queried. Several frameworks have been proposed. Uncertainty sampling is the most
common, where the active learner chooses to query the tuples which it is the least cer-
tain how to label. Other strategies work to reduce the version space , that is, the subset
of all hypotheses that are consistent with the observed training tuples. Alternatively,
we may follow a decision-theoretic approach that estimates expected error reduction.
This selects tuples that would result in the greatest reduction in the total number of
 
Search WWH ::




Custom Search