Graphics Reference
In-Depth Information
The learning procedure using gradient descent over the parameter space requires error
rates to be computed for the
p
th training data and for each node's output
O
given by:
∂
∂
E
O
(
)
p
L
=−
2
TO
−
(4.10)
ip
,
ip
,
L
ip
,
The error rate for the internal node at (
k
,
i
) can be derived using the chain rule:
k
+
1
k
+
1
∂
∂
E
O
∂
∂
E
O
∂
∂
O
O
∑
p
p
mp
,
=
(4.11)
k
k
+
1
k
ip
,
mp
,
ip
,
m
=
1
where 1 ≤
k
≤
L
− 1. Given α as a parameter of the given adaptive network, we have
∂
∂
E
∂
∂
E
O
∂
∂
O
*
∑
p
p
=
(4.12)
*
α
α
*
OS
∈
where
S
is the set of nodes whose outputs depend on α. We can get the derivative of
the overall error measure
E
with respect to α with Equation 4.13.
P
∂
∂
E
∂
∂
E
∑
p
p
=
(4.13)
α
α
=
1
p
Furthermore, we can describe the update formula for α as Equation 4.14.
∂
∂
E
Δ
α
=−
n
(4.14)
α
in which
n
is the learning rate.
Equations (4.6 to 4.14) describe the structure and learning process of the adaptive
network. In an ANFIS architecture, this network should be functionally equivalent
to a fuzzy inference system. To illustrate this mapping, consider a simple case of an
ANFIS system with two inputs
x
1
and
x
2
and one output,
y
. Suppose the rule-base
contains two fuzzy
IF-THEN
rules. Then we may write
Rule 1:
IF
x
1
is
A
1
and
x
2
is
B
1
,
THEN
f
1
=
p
1
x
1
+
q
1
x
2
+
r
1
Rule 2:
IF
x
1
is
A
2
and
x
2
is
B
2
,
THEN f
2
=
p
2
x
1
+
q
2
x
2
+
r
2
where
A
and
B
are antecedents and
f
is the output of the neuron (node) in the same
layer,
p
,
q
and
r
are the parameters specific to the node. In the adaptive network,
the membership function describing an antecedent can be denoted by the following
node function.
()
1
=
O
µ
x
(4.15)
i
A
i
Search WWH ::
Custom Search