Overview of Semi-Supervised Learning - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

CHAPTER

2

Overview of Semi-Supervised

Learning

2.1 LEARNING FROM BOTH LABELED AND UNLABELED

DATA

As the name suggests, semi-supervised learning is somewhere between unsupervised and supervised

learning. In fact, most semi-supervised learning strategies are based on extending either unsupervised

or supervised learning to include additional information typical of the other learning paradigm.

Specifically, semi-supervised learning encompasses several different settings, including:

Semi-supervised classification . Also known as classification with labeled and unlabeled data (or

partially labeled data), this is an extension to the supervised classification problem.The training

data consists of both l labeled instances

i = 1 and u unlabeled instances

l + u

j = l + 1 . One

typically assumes that there is much more unlabeled data than labeled data, i.e., u l .The goal

of semi-supervised classification is to train a classifier f from both the labeled and unlabeled

data, such that it is better than the supervised classifier trained on the labeled data alone.

{

( x i ,y i )

}

{

x j }

Constrained clustering . This is an extension to unsupervised clustering. The training data con-

sists of unlabeled instances

n

j =

1 , as well as some “supervised information” about the clusters.

For example, such information can be so-called must-link constraints, that two instances x i , x j

must be in the same cluster; and cannot-link constraints, that x i , x j cannot be in the same

cluster. One can also constrain the size of the clusters. The goal of constrained clustering is to

obtain better clustering than the clustering from unlabeled data alone.

{ x i }

There are other semi-supervised learning settings, including regression with labeled and un-

labeled data, dimensionality reduction with labeled instances whose reduced feature representation

is given, and so on. This topic will focus on semi-supervised classification.

The study of semi-supervised learning is motivated by two factors: its practical value in building

better computer algorithms, and its theoretical value in understanding learning in machines and

humans.

Semi-supervised learning has tremendous practical value. In many tasks, there is a paucity of

labeled data. The labels y may be difficult to obtain because they require human annotators, special

devices, or expensive and slow experiments. For example,

In speech recognition, an instance x is a speech utterance, and the label y is the corresponding

transcript. For example, here are some detailed phonetic transcripts of words as they are spoken:

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home