Introduction to Statistical Machine Learning - Introduction to Semi-Supervised Learning - page 4

Geoscience Reference

In-Depth Information

70

70

70

65

65

65

60

60

60

55

55

55

50

50

50

45

45

45

40

40

40

80

90

100

110

80

90

100

110

80

90

100

110

weight (lbs.)

weight (lbs.)

weight (lbs.)

Figure 1.2: Hierarchical agglomerative clustering results for k

=

2 , 3 , 4 on the 100 little green men data.

alien from his or her weight and height. Alternatively, you may want to predict whether an alien is a

juvenile or an adult using weight and height. To explain how to approach these tasks, we need more

definitions.

Definition 1.7. Label .

A label y is the desired prediction on an instance x .

{

}

. These distinct values are

called classes . The classes are usually encoded by integer numbers, e.g., female

Labels may come from a finite set of values, e.g.,

female, male

=− 1, male

= 1, and

thus y ∈{−

. This particular encoding is often used for binary (two-class) labels, and the two

classes are generically called the negative class and the positive class, respectively. For problems with

more than two classes, a traditional encoding is y ∈{

1 , 1

}

, where C is the number of classes. In

general, such encoding does not imply structure in the classes. That is to say, the two classes encoded

by y

1 ,...,C }

=

1 and y

=

2 are not necessarily closer than the two classes y

=

1 and y

=

3. Labels may also

take continuous values in

. For example, one may attempt to predict the blood pressure of little

green aliens based on their height and weight.

In supervised learning, the training sample consists of pairs, each containing an instance x and

a label y :

R

i = 1 . One can think of y as the label on x provided by a teacher, hence the name

supervised learning. Such (instance, label) pairs are called labeled data , while instances alone without

labels (as in unsupervised learning) are called unlabeled data . We are now ready to define supervised

learning.

{

( x i ,y i )

}

Definition 1.8. Supervised learning .

Let the domain of instances be

X

, and the domain of labels

Y

X × Y

be

.Let P( x ,y) be an (unknown) joint probability distribution on instances and labels

.

i . i . d .

∼

i = 1

Given a training sample

{ ( x i ,y i ) }

P( x ,y) , supervised learning trains a function f

: X → Y

F

in some function family

, with the goal that f( x ) predicts the true label y on future data x , where

( x ,y) i . i . d .

∼

P( x ,y) as well.

Next Page

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home