Forecasting Supply Chain Demand Using Machine Learning Algorithms - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

sions: ones with the pre-set parameter, and the “automatic” versions, whereby the parameters, which

resulted in the most accurate forecasts on the training set were calculated.

Although, as mentioned previously, exponential smoothing performs well in many forecasting

problems, the choice of the initial value may have a significant impact on the accuracy of forecasts.

The exponential smoothing implementation in MATLAB Financial Toolbox (MathWorks, 2005a) and

in Excel use the first value as the initial value, but, other implementations such as SPSS use the series

average as the starting value. For this reason we implemented both approaches in our exponential

smoothing and Theta models. The main purpose of the Multiple Linear Regression model is to provide

a linear benchmark for all of the auto-regressive type models such as the neural networks and support

vector machines.

ar Ma

The ARMA model combines both Auto-Regressive forecast and a Moving Average forecast (Box et

al., 1994). To minimize the error, we optimized the lag used in the auto-regression portion and the lag

used in the moving average portion. This functionality is provided by the MATLAB GARCH Toolbox

(MathWorks, 2005b). The ARMAX model is optimized to minimize the error using the Optimization

Toolbox (MathWorks, 2005e). Only the ARMA part of the ARMAX model was used in the current

experiments.

Theta Model

We have used the version of the Theta model (Assimakopoulos & Nikolpoulos, 2000) used in M3 fore-

casting competition. First, the linear trend was calculated, and then exponential smoothing performed

on double the difference between the raw data and trend values to minimize the error on the training

set. The two individual series, the linear trend and the optimized exponential smoothing on the decom-

posed series were recombined by an average of the two. As already mentioned, we implemented both

versions of the Theta model, one with the first observation of the time series as the initialization value

and the other with the average of the training set as the initialization value.

neural network details

Neural networks, while being universal approximators may suffer from the “overfitting” problem: i.e.

building complex non-linear mappings when different mappings are actually required. Overfitting

leads to poor generalization and can be combated by adding more data to the training set or keeping

the learning power (size) of the network low. Setting a window size of 5% of the training set data for

the regular time series data models, results in a ratio of 1 input to 20 observations. Therefore, to pro-

vide appropriate level of non-linearity and additional modeling power, we created one hidden layer that

contained 2 neurons with non-linear transfer functions. Even then, with the small datasets there is still

a danger of overfitting the data. The total number of weights for a neural network with one hidden layer

can be calculated as follows:

Total Weights = p w h b h h o b o

⋅ ⋅ + ⋅ + ⋅ + ⋅ ,

Search WWH ::

Custom Search

Home