Agriculture Reference
In-Depth Information
represents a sen-
sitivity factor, determining the direction
of search in weight space for synaptic
weight w ji ( m ).
Using the chain rule, the gradient can
be described as:
The gradient ( ( )
()
Synaptic weights in layer k are fitted ac-
cording to the general delta rule:
SE m
wm
ji
(
) = () +
(
)
wn wn wn
ny n
k
+
1
k
a
(
k
-
1
)
ji
ji
ji
+ () ()
hd
k
k
-
1
(7.15)
j
i
Where η is the learning curve and α is mo-
mentum (Haykin, 2001).
The lower the η value, the smoother the
trajectory in the weight space. This is a dis-
advantage because learning becomes slow.
On the other hand, high η values indicate
fast learning but may destabilize the network.
Learning rate is a proportional constant
between zero and one. Very low rates, close
to zero, make learning very slow, whereas
very high rates, close to one, may cause the
network to oscillate without learning. There-
fore, learning rate must be adaptive and con-
trolled by the network.
Momentum rate is a parameter that also
ranges between zero and one and provides
sufficient speed to avoid the local minima
that may be found which would otherwise
prevent the system from reaching the global
optimum.
When is the process interrupted? Basheer
and Hajmeer (2000) identified some criteria
to stop the process, such as minimum error,
number of cycles and cross-validation. When
the minimized error is lower than an adopted
criterion the process stops. The number of
cycles defines the number of times a dataset
is submitted to training. Excessive training
may cause the network to lose its generaliza-
tion power (overfitting), and too little training
results in poor performance (underfitting).
It  is difficult, therefore, to determine when
training should stop.
One way is to determine the beginning
of excessive training by cross-validation,
which divides the training set into two sub-
sets: one for training and one for validation.
Using a training set, training is performed
for several periods (cycles) and then the se-
lected models are tested in the validation
set for each training cycle. This procedure is
called the early stopping training method .
Training and validation learning curves pre-
sent different behaviour. During training,
the curve monotonously decreases as the
number of cycles increase, whereas during
() = ( ( )
(
)
()
EQ n
wn
EQ n
en
en
yn
()
()
()
ji
j
()
()
j (()
()
yn
vn
vn
(7.7)
j
j
wn
ji
j
j
But:
()
(
)
() = () ()
EQ n
en en
en
yn
j
,
() =-
1
j
j
j
()
() = ¢ ()
yn
vn
j (
)
j
vn
(7.8)
j
j
j
And:
()
() = ()
vn
wn yn
j
(7.9)
i
ji
Then:
(
)
()
EQ n
wn en vnyn
ji
(
) ()
() =− () ()
ϕ
(7.10)
j
j
j
i
According to the delta rule:
() =- ( ( )
EQ n
wn
D
wn
h
()
(7.11)
ji
ji
Where η  is learning rate and the negative
signal determines a direction to change the
weight that will reduce SE value. Therefore:
(
)
(
) ()
() =− − () ()
wn en vnyn
η ϕ
ji
j
j
j
i
(7.12)
= () ()
hd
ny n
j
i
Where d j ( n ) = e j (n) j j ' ( v j ( n )) is the local gradient.
The local gradient is calculated differently
for hidden and output neurons. Neuron j of
the output layer:
(
)
() = () ¢ ()
d j
nenvn
j
(7.13)
j
j
j
Neuron j of the hidden layer:
(
) ¢
å
() =
()
() ()
d
n
v n
k
d
k
+
1
n wn
k
+
1
j
j
j
m
mj
m
(7.14)
 
 
Search WWH ::




Custom Search