Geoscience Reference
In-Depth Information
where f( x )
y is1if f predicts a different class than y on x , and 0 otherwise. For regression, one
commonly used loss function is the squared loss c( x ,y,f( x )) (f ( x i ) y i ) 2 :
=
n
1
n
(f ( x i ) y i ) 2 .
(1.6)
i = 1
Other loss functions will be discussed as we encounter them later in the topic.
It might be tempting to seek the f that minimizes training sample error. However, this
strategy is flawed: such an f will tend to overfit the particular training sample. That is, it will likely
fit itself to the statistical noise in the particular training sample. It will learn more than just the
true relationship between
X
Y
. Such an overfitted predictor will have small training sample
error, but is likely to perform less well on future test data. A sub-area within machine learning called
computational learning theory studies the issue of overfitting. It establishes rigorous connections
between the training sample error and the true error, using a formal notion of complexity such as
the Vapnik-Chervonenkis dimension or Rademacher complexity. We provide a concise discussion
in Section 8.1. Informed by computational learning theory, one reasonable training strategy is to
seek an f that “almost” minimizes the training sample error, while regularizing f so that it is not too
complex in a certain sense. Interested readers can find the references in the bibliographical notes.
To estimate f 's future performance, one can use a separate sample of labeled instances, called
the test sample :
and
i . i . d .
n + m
j = n + 1
P( x ,y) . A test sample is not used during training, and therefore
provides a faithful (unbiased) estimation of future performance.
Definition 1.12. Test sample error .
{
( x j ,y j )
}
The corresponding test sample error for classification with
0-1 loss is
n
+
m
1
m
(f ( x j )
=
y j ),
(1.7)
j = n +
1
and for regression with squared loss is
n + m
1
m
(f ( x j ) y j ) 2 .
(1.8)
j = n + 1
In the remainder of the topic, we focus on classification due to its prevalence in semi-supervised
learning research. Most ideas discussed also apply to regression, though.
As a concrete example of a supervised learning method, we now introduce a simple classifi-
cation algorithm: k -nearest-neighbor ( k NN).
Algorithm 1.13. k -nearest-neighbor classifier.
Input: Training data ( x 1 ,y 1 ),...,( x n ,y n ); distance function d();
number of neighbors k; test instance x
 
Search WWH ::




Custom Search