Geoscience Reference
In-Depth Information
70
70
70
65
65
65
60
60
60
55
55
55
50
50
50
45
45
45
40
40
40
80
90
100
110
80
90
100
110
80
90
100
110
weight (lbs.)
weight (lbs.)
weight (lbs.)
Figure 1.2: Hierarchical agglomerative clustering results for k
=
2 , 3 , 4 on the 100 little green men data.
alien from his or her weight and height. Alternatively, you may want to predict whether an alien is a
juvenile or an adult using weight and height. To explain how to approach these tasks, we need more
definitions.
Definition 1.7. Label .
A label y is the desired prediction on an instance x .
{
}
. These distinct values are
called classes . The classes are usually encoded by integer numbers, e.g., female
Labels may come from a finite set of values, e.g.,
female, male
=− 1, male
= 1, and
thus y ∈{−
. This particular encoding is often used for binary (two-class) labels, and the two
classes are generically called the negative class and the positive class, respectively. For problems with
more than two classes, a traditional encoding is y ∈{
1 , 1
}
, where C is the number of classes. In
general, such encoding does not imply structure in the classes. That is to say, the two classes encoded
by y
1 ,...,C }
=
1 and y
=
2 are not necessarily closer than the two classes y
=
1 and y
=
3. Labels may also
take continuous values in
. For example, one may attempt to predict the blood pressure of little
green aliens based on their height and weight.
In supervised learning, the training sample consists of pairs, each containing an instance x and
a label y :
R
i = 1 . One can think of y as the label on x provided by a teacher, hence the name
supervised learning. Such (instance, label) pairs are called labeled data , while instances alone without
labels (as in unsupervised learning) are called unlabeled data . We are now ready to define supervised
learning.
{
( x i ,y i )
}
Definition 1.8. Supervised learning .
Let the domain of instances be
X
, and the domain of labels
Y
X × Y
be
.Let P( x ,y) be an (unknown) joint probability distribution on instances and labels
.
i . i . d .
i = 1
Given a training sample
{ ( x i ,y i ) }
P( x ,y) , supervised learning trains a function f
: X Y
F
in some function family
, with the goal that f( x ) predicts the true label y on future data x , where
( x ,y) i . i . d .
P( x ,y) as well.
 
Search WWH ::




Custom Search