Databases Reference
In-Depth Information
What can we do if we want to build a classifier for data where only some of the data
are class-labeled, but most are not? Document classification, speech recognition, and
information extraction are just a few examples of applications in which unlabeled data
are abundant. Consider document classification , for example. Suppose we want to build
a model to automatically classify text documents like articles or web pages. In particular,
we want the model to distinguish between hockey and football documents. We have a
vast amount of documents available, yet the documents are not class-labeled. Recall that
supervised learning requires a training set, that is, a set of classlabeled data. To have a
human examine and assign a class label to individual documents (to form a training set)
is time consuming and expensive.
Speech recognition requires the accurate labeling of speech utterances by trained lin-
guists. It was reported that 1 minute of speech takes 10 minutes to label, and annotating
phonemes (basic units of sound) can take 400 times as long. Information extraction sys-
tems are trained using labeled documents with detailed annotations. These are obtained
by having human experts highlight items or relations of interest in text such as the names
of companies or individuals. High-level expertise may be required for certain knowl-
edge domains such as gene and disease mentions in biomedical information extraction.
Clearly, the manual assignment of class labels to prepare a training set can be extremely
costly, time consuming, and tedious.
We study three approaches to classification that are suitable for situations where there
is an abundance of unlabeled data. Section 9.7.2 introduces semisupervised classifi-
cation, which builds a classifier using both labeled and unlabeled data. Section 9.7.3
presents active learning , where the learning algorithm carefully selects a few of the un-
labeled data tuples and asks a human to label only those tuples. Section 9.7.4 presents
transfer learning , which aims to extract the knowledge from one or more source tasks
(e.g., classifying camera reviews) and apply the knowledge to a target task (e.g., TV
reviews). Each of these strategies can reduce the need to annotate large amounts of data,
resulting in cost and time savings.
9.7.1 Multiclass Classification
Some classification algorithms, such as support vector machines, are designed for binary
classification. How can we extend these algorithms to allow for multiclass classification
(i.e., classification involving more than two classes)?
A simple approach is one-versus-all (OVA). Given m classes, we train m binary clas-
sifiers, one for each class. Classifier j is trained using tuples of class j as the positive class,
and the remaining tuples as the negative class. It learns to return a positive value for class
j and a negative value for the rest. To classify an unknown tuple, X , the set of classifiers
vote as an ensemble. For example, if classifier j predicts the positive class for X , then
class j gets one vote. If it predicts the negative class for X , then each of the classes except
j gets one vote. The class with the most votes is assigned to X .
All-versus-all (AVA) is an alternative approach that learns a classifier for each pair
of classes. Given m classes, we construct
m
.
m 1
/
binary classifiers. A classifier is trained
2
 
Search WWH ::




Custom Search