Information Technology Reference
In-Depth Information
where p is the number of periods in the data; w is the window ratio; h is the number of hidden layer
neurons; b is the number of biases (one per computational element); and o is the number of outputs.
Therefore for the current implementation, the number of weights will always be:
p ⋅ ⋅ + ⋅ + ⋅ + ⋅
2 1 2 2 1 1 1
Total Weights =
And the Observations to Weights ratio is:
(
)
p w
p w h b h h o b o
⋅ −
⋅ ⋅ + ⋅ + ⋅ + ⋅
Observations to Weights =
(
)
Therefore, for the chocolate manufacturer dataset, Observations to Weights ratio is:
(
)
38 1 0.05
38 0.05 2 1 2 2 1 1 1
⋅ −
Observations to Weights =
= 4.10
(
)
⋅ + ⋅ + ⋅ + ⋅
As with linear regression where at least 10 observations per variable are desirable, there should be a
minimum of 10 observations for each weight. This estimation varies based on the expected complexity
of the pattern being modeled. For example, the more complex the expected pattern or the more noise,
the more observation per a weight is required. Thus an observations-to-weights ratio of 4.1 is somewhat
low. However, because time series forecasting is a function of the past information, we determined that
the window size must include 2 or more periods. For the current research project, since our smallest
data set contains 47 periods, including 38 for the training set, a window size of 5% represents 2 past
periods. For our largest dataset, there are 148 periods, 118 for the training set, thus representing a win-
dow of 6 periods.
The transfer function we used in the hidden layer is the tan-sigmoid function, which does non-linear
scaling from an infinite range to values between -1 and 1, and the output layer transfer function is linear.
Additionally, each neuron contains a bias input that has a constant of unity.
The relevant aspects of the supply chain demand modeling neural network are displayed in Figure 9.
In this figure, the sum is represented sigma (Σ), the tan-sigmoid transfer function is represented by ( ),
and the linear transfer function is represented by ( ). All of the inputs to the neural network, as well
as the outputs are individually scaled between -1 and 1 to ensure that they are within the appropriate
range for neural network training. The final results are then un-scaled to permit comprehensible analysis
and usage.
The first implementation of neural networks is based on the traditional backpropagation algorithm.
The structure of neural networks must be defined in advance by specifying such parameters as the
number of hidden layers and the neurons within each hidden layer. Other settings that must be defined
relate to the learning algorithm, e.g., the learning rate and the momentum.
Setting a constant learning rate for the training session is not desirable because the ideal learning
rate may change based on the current progress of the networks learning. An adaptive variable learning
rate training algorithm has been adopted, which adjusts the learning rate for the current learning error
space (Hagan, Demuth, & Beale, 1996). This algorithm tries to maximize the learning rate subject to
stable learning, thus adapting to the complexity of the local learning error space. For example, if the
descent path to the lowest error is straight and simple, the learning rate will be high. If the descent path
is variable, complicated and unclear, the learning rate will be very small to permit more stable learning
 
Search WWH ::




Custom Search