Databases Reference
In-Depth Information
inadequate solutions may occur. A rule of thumb is to set the learning rate to 1
=
t , where
t is the number of iterations through the training set so far.
Biases are updated by the following equations, where
1 j is the change in bias
j :
1 j D.
l
/
Err j .
(9.10)
j D j C1 j .
(9.11)
Note that here we are updating the weights and biases after the presentation of each
tuple. This is referred to as case updating . Alternatively, the weight and bias incre-
ments could be accumulated in variables, so that the weights and biases are updated
after all the tuples in the training set have been presented. This latter strategy is called
epoch updating , where one iteration through the training set is an epoch . In the-
ory, the mathematical derivation of backpropagation employs epoch updating, yet
in practice, case updating is more common because it tends to yield more accurate
results.
Terminating condition: Training stops when
All
1
w ij
in
the
previous
epoch
are
so
small
as
to
be
below
some
specified
threshold, or
The percentage of tuples misclassified in the previous epoch is below some thresh-
old, or
A prespecified number of epochs has expired.
In practice, several hundreds of thousands of epochs may be required before the weights
will converge.
How efficient is backpropagation? ” The computational efficiency depends on the
time spent training the network. Given j D j tuples and w weights, each epoch requires
O
time. However, in the worst-case scenario, the number of epochs can be
exponential in n , the number of inputs. In practice, the time required for the networks
to converge is highly variable. A number of techniques exist that help speed up the train-
ing time. For example, a technique known as simulated annealing can be used, which
also ensures convergence to a global optimum.
.j D j w
/
Example 9.1 Sample calculations for learning by the backpropagation algorithm. Figure 9.5 shows
a multilayer feed-forward neural network. Let the learning rate be 0.9. The initial weight
and bias values of the network are given in Table 9.1, along with the first training tuple,
X D.
, with a class label of 1.
This example shows the calculations for backpropagation, given the first training
tuple, X . The tuple is fed into the network, and the net input and output of each unit
1, 0, 1
/
 
Search WWH ::




Custom Search