Geoscience Reference
In-Depth Information
to be overfitting, and can be expected to be a good predictor. In this section we discuss some other
algorithms which explicitly enforce hypothesis agreement, without requiring explicit feature splits
or the iterative mutual-teaching procedure. To understand these algorithms, we need to introduce
the regularized risk minimization framework for machine learning.
Recall that, in general, we can define a loss function to specify the cost of mistakes in prediction:
Definition 4.3. Loss Function .L t x
its true label, and f( x ) our
prediction. A loss function c( x ,y,f( x )) ∈[ 0 , ) measures the amount of loss, or cost, of this
prediction.
X
be an instance, y
Y
For example, in regression we can define the squared loss c( x ,y,f( x )) = (y f( x )) 2 .In
classification we can define the 0/1 loss as c( x ,y,f( x )) = 1 if y = f( x ) , and 0 otherwise. The
loss function can be different for different types of misclassification. In medical diagnosis we
might use c( x ,y =
healthy ) = 100.
The loss function can also depend on the instance x : The same amount of medical prediction error
might incur a higher loss on an infant than on an adult.
healthy ,f( x ) =
diseased ) = 1 and c( x ,y =
diseased ,f( x ) =
Definition 4.4. Empirical Risk .
The empirical risk of f is the average loss incurred by f on a
l i = 1 c( x i ,y i ,f( x i )) .
R(f ) =
1
labeled training sample:
Applying the principle of empirical risk minimization (ERM)—finding the f that minimizes
the empirical risk—may seem like a natural thing to do:
R(f ),
f ERM
=
argmin
f F
(4.2)
F
where
is the set of all hypotheses we consider. For classification with 0/1 loss, ERM amounts to
minimize the training sample error.
However, f ERM
can overfit the particular training sample. As a consequence, f ERM
is not
necessarily the classifier in
with the smallest true risk on future data. One remedy is to regularize
the empirical risk by a regularizer (f) . The regularizer (f) is a non-negative functional, i.e.,
it takes a function f as input and outputs a non-negative real value. The value is such that if f is
“smooth” or “simple” in some sense, (f) will be close to zero; if f is too zigzagged (i.e., it overfits
and attempts to pass through all labeled training instances), (f) is large.
F
Definition 4.5. Regularized Risk .
The regularized risk is the weighted sum of the empirical risk
R(f )
and the regularizer, with weight λ
λ(f ) .The principle of regularized riskminimization
is to find the f that minimizes the regularized risk:
0:
+
f =
R(f ) + λ(f ).
argmin
f F
(4.3)
Search WWH ::




Custom Search