Geoscience Reference
In-Depth Information
Depending on the domain of label y , supervised learning problems are further divided into
classification and regression :
Definition 1.9. Classification .
Classification is the supervised learning problem with discrete
classes
Y
. The function f is called a classifier .
Y
Definition 1.10. Regression .
Regression is the supervised learning problem with continuous
.
The function f is called a regression function .
What exactly is a good f ? The best f is by definition
f =
argmin
f F
E ( x ,y) P [ c( x ,y,f( x )) ] ,
(1.3)
where argmin means “finding the f that minimizes the following quantity”.
] is the
expectation over random test data drawn from P . Readers not familiar with this notation may wish
to consult Appendix A. c(
E ( x ,y) P [
·
) is a loss function that determines the cost or impact of making a prediction
f( x ) that is different from the true label y . Some typical loss functions will be discussed shortly. Note
we limit our attention to some function family
·
, mostly for computational reasons. If we remove
this limitation and consider all possible functions, the resulting f is the Bayes optimal predictor , the
best one can hope for on average. For the distribution P , this function will incur the lowest possible
loss when making predictions. The quantity
F
E ( x ,y) P [ c( x ,y,f ( x )) ]
is known as the Bayes error .
F
F
However, the Bayes optimal predictor may not be in
in general. Our goal is to find the f
that is as close to the Bayes optimal predictor as possible.
It is worth noting that the underlying distribution P( x ,y) is unknown to us.Therefore, it is not
possible to directly find f , or even to measure any predictor f 's performance, for that matter. Here
lies the fundamental difficulty of statistical machine learning: one has to generalize the prediction
from a finite training sample to any unseen test data. This is known as induction .
To proceed, a seemingly reasonable approximation is to gauge f 's performance using training
sample error. That is, to replace the unknown expectation by the average over the training sample:
i = 1 , the training sample
Definition 1.11. Training sample error .
Given a training sample
{ ( x i ,y i ) }
error is
n
1
n
c( x i ,y i ,f( x i )).
(1.4)
i = 1
For classification, one commonly used loss function is the 0-1 loss c( x ,y,f( x )) (f ( x i ) =
y i ) :
n
1
n
(f ( x i )
=
y i ),
(1.5)
i =
1
Search WWH ::




Custom Search