Information Technology Reference
In-Depth Information
We can calculate the total number of weights as:
Total Weights = p w h b h h h h o b o
⋅ ⋅ + ⋅ + ⋅ + ⋅ + ⋅
Therefore for the current implementation, the number of weights will always be:
p ⋅ ⋅ + ⋅ + ⋅ + ⋅ + ⋅
2 1 2 2 2 2 1 1 1
Total Weights =
For example, for the chocolate manufacturer dataset, Observations to Weights ratio is:
(
)
38 1 0.05
38 0.05 2 1 2 2 2 2 1 1 1
⋅ −
Observations to Weights =
= 2.82
(
)
⋅ + ⋅ + ⋅ + ⋅ + ⋅
Naturally, the observations-to-weights ratio is even lower than the 4.1 previously identified for the
neural network without the recurrent connections. However, the window size ratio and the number of
hidden layer neurons should not be further reduced since they are already at their lowest meaningful
levels.
Support Vector Machine details
The support vector machine software implementation selected for the current experiment was mySVM
(Rüping, 2005) which is based on the SVMLight optimization algorithm (Joachims, 1999). The inner
product kernel was used and the complexity constant was automatically determined using cross-valida-
tion procedure.
Two cross-validation procedures were tested. The first one was a simple 10 fold cross-validation that
ignores the time direction of the data. Thus, for 10 iterations, 9/10th of the data were used to build a
model and the remaining 1/10th was used to test the accuracy. The second one simulated time ordered
predictions, called windowed cross-validation. This cross-validation procedure split the training data
set into 10 parts and the algorithm trained the model using 5 parts and tested on a 6th part. This 5-part
window was moved along the data, which resulted in the procedure being repeated 5 times. For example
blocks 1-5 were used to train and the model was tested on block 6, then blocks 2-6 were used to train
the model and tested on block 7, and so on.
The errors of these five models were averaged and the complexity constant with the smallest cross-
validation error was selected as the level of complexity that provided the best generalization. Increasing
the complexity constant from a very small value, which does not model the data well, to a very large
value, which overfits the data, results in an error curve which permits the minimization of the general-
ization error. An example error curve for the complexity constant search on a 10-fold cross-validation
set with a 5 fold sliding window for the complexity constant range between 0.00000001 and 100 with
a multiplicative step of 1.1, is presented in Figure 13.
We also present an example of the data underfitting the model with complexity constant 0.00000001,
overfitting with 1000 and the optimal estimated generalization fit is with a complexity constant of
0.012154154 in Figure 14. In this diagram, we can see that the Support Vector Machine with a very
low complexity constant just presents the average of the training set and thus does not offer much
predictive power. The very high complexity constant memorizes the training set, as can be seen in the
diagram where the high complexity forecast overlaps the actual demand in the training set (period 1
Search WWH ::




Custom Search