Information Technology Reference
In-Depth Information
1
0.8
0.6
0.4
0.2
0
-0.2
w = 0.1
w = 1
w = 10
-0.4
-0.6
-0.8
-1
-3
-2
-1
0
1
2
3
x
Fig. 2.13. Function y = tanh( wx ) for 3 values of w
Fig. 2.14. Norm of the vector of parameters during training
where q is the number of parameters of the classifier, and α is a hyperpara-
meter whose value must be found by performing a tradeoff: if α is too large,
the minimization decreases the values of the parameters irrespective of the
modeling error; by contrast, if α is too small, the regularization term has no
impact on training, hence overfitting may occur.
The operation of the method is very simple: the gradient of J is com-
puted by backpropagation, and the contribution of the regularization term is
subsequently added,
J =
J + α w .
Nevertheless, it should be noticed that the parameters of the network have
different effects:
Search WWH ::




Custom Search