Classification: Advanced Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

inadequate solutions may occur. A rule of thumb is to set the learning rate to 1

=

t , where

t is the number of iterations through the training set so far.

Biases are updated by the following equations, where

1 j is the change in bias

j :

1 j D.

l

/

Err j .

(9.10)

j D j C1 j .

(9.11)

Note that here we are updating the weights and biases after the presentation of each

tuple. This is referred to as case updating . Alternatively, the weight and bias incre-

ments could be accumulated in variables, so that the weights and biases are updated

after all the tuples in the training set have been presented. This latter strategy is called

epoch updating , where one iteration through the training set is an epoch . In the-

ory, the mathematical derivation of backpropagation employs epoch updating, yet

in practice, case updating is more common because it tends to yield more accurate

results.

Terminating condition: Training stops when

All

1

w ij

in

the

are

so

small

as

to

be

below

some

specified

threshold, or

The percentage of tuples misclassified in the previous epoch is below some thresh-

old, or

A prespecified number of epochs has expired.

In practice, several hundreds of thousands of epochs may be required before the weights

will converge.

“ How efficient is backpropagation? ” The computational efficiency depends on the

time spent training the network. Given j D j tuples and w weights, each epoch requires

O

time. However, in the worst-case scenario, the number of epochs can be

exponential in n , the number of inputs. In practice, the time required for the networks

to converge is highly variable. A number of techniques exist that help speed up the train-

ing time. For example, a technique known as simulated annealing can be used, which

also ensures convergence to a global optimum.

.j D j w

/

Example 9.1 Sample calculations for learning by the backpropagation algorithm. Figure 9.5 shows

a multilayer feed-forward neural network. Let the learning rate be 0.9. The initial weight

and bias values of the network are given in Table 9.1, along with the first training tuple,

X D.

, with a class label of 1.

This example shows the calculations for backpropagation, given the first training

tuple, X . The tuple is fed into the network, and the net input and output of each unit

1, 0, 1

/

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home