Database Reference
In-Depth Information
print params
print metrics
The output of the preceding code:
[0.01, 0.025, 0.05, 0.1, 0.5]
[1.4869656275309227, 1.4189071944747715,
1.5027293911925559, 1.5384660954019973, nan]
Now, we can see why we avoided using the default step size when training the linear mod-
el originally. The default is set to 1.0 , which, in this case, results in a nan output for the
RMSLE metric. This typically means that the SGD model has converged to a very poor
local minimum in the error function that it is optimizing. This can happen when the step
size is relatively large, as it is easier for the optimization algorithm to overshoot good
solutions.
We can also see that for low step sizes and a relatively low number of iterations (we used
10 here), the model performance is slightly poorer. However, in the preceding Iterations
section, we saw that for the lower step-size setting, a higher number of iterations will gen-
erally converge to a better solution.
Generally speaking, setting step size and number of iterations involves a trade-off. A
lower step size means that convergence is slower but slightly more assured. However, it
requires a higher number of iterations, which is more costly in terms of computation and
time, in particular at a very large scale.
Tip
Selecting the best parameter settings can be an intensive process that involves training a
model on many combinations of parameter settings and selecting the best outcome. Each
instance of model training involves a number of iterations, so this process can be very ex-
pensive and time consuming when performed on very large datasets.
The output is plotted here, again using a log scale for the step-size axis:
Search WWH ::




Custom Search