Forecasting Supply Chain Demand Using Machine Learning Algorithms - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

We can calculate the total number of weights as:

Total Weights = p w h b h h h h o b o

⋅ ⋅ + ⋅ + ⋅ + ⋅ + ⋅

Therefore for the current implementation, the number of weights will always be:

p ⋅ ⋅ + ⋅ + ⋅ + ⋅ + ⋅

2 1 2 2 2 2 1 1 1

Total Weights =

For example, for the chocolate manufacturer dataset, Observations to Weights ratio is:

(

)

38 1 0.05

38 0.05 2 1 2 2 2 2 1 1 1

⋅ −

Observations to Weights =

= 2.82

(

)

⋅

⋅ + ⋅ + ⋅ + ⋅ + ⋅

Naturally, the observations-to-weights ratio is even lower than the 4.1 previously identified for the

neural network without the recurrent connections. However, the window size ratio and the number of

hidden layer neurons should not be further reduced since they are already at their lowest meaningful

levels.

Support Vector Machine details

The support vector machine software implementation selected for the current experiment was mySVM

(Rüping, 2005) which is based on the SVMLight optimization algorithm (Joachims, 1999). The inner

product kernel was used and the complexity constant was automatically determined using cross-valida-

tion procedure.

Two cross-validation procedures were tested. The first one was a simple 10 fold cross-validation that

ignores the time direction of the data. Thus, for 10 iterations, 9/10th of the data were used to build a

model and the remaining 1/10th was used to test the accuracy. The second one simulated time ordered

predictions, called windowed cross-validation. This cross-validation procedure split the training data

set into 10 parts and the algorithm trained the model using 5 parts and tested on a 6th part. This 5-part

window was moved along the data, which resulted in the procedure being repeated 5 times. For example

blocks 1-5 were used to train and the model was tested on block 6, then blocks 2-6 were used to train

the model and tested on block 7, and so on.

The errors of these five models were averaged and the complexity constant with the smallest cross-

validation error was selected as the level of complexity that provided the best generalization. Increasing

the complexity constant from a very small value, which does not model the data well, to a very large

value, which overfits the data, results in an error curve which permits the minimization of the general-

ization error. An example error curve for the complexity constant search on a 10-fold cross-validation

set with a 5 fold sliding window for the complexity constant range between 0.00000001 and 100 with

a multiplicative step of 1.1, is presented in Figure 13.

We also present an example of the data underfitting the model with complexity constant 0.00000001,

overfitting with 1000 and the optimal estimated generalization fit is with a complexity constant of

0.012154154 in Figure 14. In this diagram, we can see that the Support Vector Machine with a very

low complexity constant just presents the average of the training set and thus does not offer much

predictive power. The very high complexity constant memorizes the training set, as can be seen in the

diagram where the high complexity forecast overlaps the actual demand in the training set (period 1

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Search WWH ::

Custom Search

Home