Classification: Advanced Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

What can we do if we want to build a classifier for data where only some of the data

are class-labeled, but most are not? Document classification, speech recognition, and

information extraction are just a few examples of applications in which unlabeled data

are abundant. Consider document classification , for example. Suppose we want to build

a model to automatically classify text documents like articles or web pages. In particular,

we want the model to distinguish between hockey and football documents. We have a

vast amount of documents available, yet the documents are not class-labeled. Recall that

supervised learning requires a training set, that is, a set of classlabeled data. To have a

human examine and assign a class label to individual documents (to form a training set)

is time consuming and expensive.

Speech recognition requires the accurate labeling of speech utterances by trained lin-

guists. It was reported that 1 minute of speech takes 10 minutes to label, and annotating

phonemes (basic units of sound) can take 400 times as long. Information extraction sys-

tems are trained using labeled documents with detailed annotations. These are obtained

by having human experts highlight items or relations of interest in text such as the names

of companies or individuals. High-level expertise may be required for certain knowl-

edge domains such as gene and disease mentions in biomedical information extraction.

Clearly, the manual assignment of class labels to prepare a training set can be extremely

costly, time consuming, and tedious.

We study three approaches to classification that are suitable for situations where there

is an abundance of unlabeled data. Section 9.7.2 introduces semisupervised classifi-

cation, which builds a classifier using both labeled and unlabeled data. Section 9.7.3

presents active learning , where the learning algorithm carefully selects a few of the un-

labeled data tuples and asks a human to label only those tuples. Section 9.7.4 presents

transfer learning , which aims to extract the knowledge from one or more source tasks

(e.g., classifying camera reviews) and apply the knowledge to a target task (e.g., TV

reviews). Each of these strategies can reduce the need to annotate large amounts of data,

resulting in cost and time savings.

9.7.1 Multiclass Classification

Some classification algorithms, such as support vector machines, are designed for binary

classification. How can we extend these algorithms to allow for multiclass classification

(i.e., classification involving more than two classes)?

A simple approach is one-versus-all (OVA). Given m classes, we train m binary clas-

sifiers, one for each class. Classifier j is trained using tuples of class j as the positive class,

and the remaining tuples as the negative class. It learns to return a positive value for class

j and a negative value for the rest. To classify an unknown tuple, X , the set of classifiers

vote as an ensemble. For example, if classifier j predicts the positive class for X , then

class j gets one vote. If it predicts the negative class for X , then each of the classes except

j gets one vote. The class with the most votes is assigned to X .

All-versus-all (AVA) is an alternative approach that learns a classifier for each pair

of classes. Given m classes, we construct

m

.

m 1

/

binary classifiers. A classifier is trained

2

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home