Forecasting Supply Chain Demand Using Machine Learning Algorithms - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

to 14); however, it generalizes very poorly during the testing set (period 15+). The optimal complexity

constant of 0.012154154, as identified by the complexity constant optimization based on the windowed

cross-validation procedure (as described in the previous paragraphs), provides a forecast that represents

the level of patterns learning that seems to generalize best.

Super Wide Model

As indicated earlier by the observations-to-weights ratio, since the available time series are very short,

there are not many examples for learning complex patterns. The separation of the data set into training,

cross-validation and testing sets and the loss of periods due to the windowing all combined to further

reduce the set of usable observations. Based on the assumption that several products of the same manu-

facturer probably have similar demand patterns, we introduced what we called a Super Wide Model.

This method takes a wide selection of time-series from the same problem domain and combines them

into one large model that effectively increases the number of training examples. This large number

of training examples permits an increase in input dimensionality (e.g. larger window size) and model

complexity.

For example, in this experiment, we consider 100 time series for each of the sources. With the Super

Wide Model, we use the data from all of the 100 time series simultaneously to train the model. This

provides a large number of training examples and permits us to greatly increase the window size so that

the models can look deep into the past data. Additionally, it could also be used to look across various

other information sources that may be correlated to the demand, such as category averages or comple-

ment or substitute product demand information.

For example, for the chocolate factory data set, there are 100 products and 47 periods of time series

data. Once the training and testing set are separated, we have 38 periods of data. For this type of model,

we choose a window size of 50%, which is a perfect balance between modeling the demand behavior as

a function of the past 50% of the data and using 50% of the data as examples. Using this large window

size of 50% with the traditional time series model would provide a training set of 19 examples for a

window size of 19 that would not represent very much data to identify patterns that may be present in

the future. However, with the Super Wide Model, we have 1900 examples for a window size of 19 that

represent sufficient data to find the best forecasting patterns for the problem domain.

All of the models that learn from past demand, such as the multiple linear regression, neural networks

and support vector machines will be tested also on the Super Wide Models. The only exception is the

recurrent neural networks because the necessary tools are not yet available. Although training a recurrent

neural network on a Super Wide Model is feasible in principle, it would require a reset of the recurrent

connections for every product because time lagged signals between products would not make sense.

The neural network models were enlarged to 10 hidden layer neurons, which in combination with

the very large window, results in large network sizes compared to the patterns to be detected. With a

window size of 50% of the training data, we have a ratio of 1 input to 1 observation. We then multiplied

the observations by 100 products (because of the Super Wide model format) to calculate the observa-

tions-to-weights ratio for the chocolate manufacturer dataset. We can calculate the total number of

weights as:

Total Weights = p w h b h h o b o

⋅ ⋅ + ⋅ + ⋅ + ⋅

Search WWH ::

Custom Search

Home