Information Technology Reference
In-Depth Information
(13.29)
The difference between current intensity and equilibrium intensity is reduced
by factor
b
every time the intensity is changed. For example, assume that
p
* =
500
b
= 0.1 and
S
i
(
t
)= 100, then
S
i
(
t
+1) = 100- 10 + 50 = 140. Note that the
error
S
i
is reduced from 400 to 360; that is, this error is reduced by 10%.
We can find that under the case of constant reward, each rule's intensity can
quickly converge to equilibrium intensity, and reward can be evaluated when the
plot is over. A possible restriction of PSP is that credit must be assigned in the
interval corresponding to the plot differentiated by exterior reward. It is very
important to select such a plot.
Suppose that rule
p
*-
R
i
is ignited at step τ while rule
R
j
at step τ+ 1. Then BBA
uses the following formula to modify the intensity
S
i
of rule
R
i
:
j
(τ) (13.30)
Except that plot index t is replaced with step index
τ
and exterior reward
p
(
t
)
is replaced with the intensity
S
i
(τ+1) =
S
i
(τ) -
bS
i
(τ) +
bS
R
j
, this formula is the same as (13.26). The
first change means that the number of modifying rule's intensity in a given plot is
larger than one. The second modification leads to the basic difference between
PSP and BBA. Consider two pieces of rule
S
j
of rule
R
i
and
R
j
. Rule
R
i
is ignited after rule
R
j
. Assume that
R
i
and
R
j
are ignited in a plot no more than one time, then we
have:
t
Ã
(
i
-
b
)
t-i
*
)
t
*
] =
*
S
= lim
t
ŗ¯
[(1-
b
S
i
(0)+
b
p
p
i
i
=
1
t
Ã
b
(1-
b
)
t-i
S
j
(
i
-1)
)
t
S
i
(
t
) = (1-
b
S
i
(0) +
(13.31)
i
=
1
where the range of
t
is the whole plot and the two activity. In other words, the
intensity of
R
i
follows that of
R
j
. If
S
j
can converge to a constant
S
j
*, then
S
i
can
also converge.
t
Ã
(1-
b
)
t-i
S
j
(
i
-1)] =
S
j
*
i
*
= lim
t
ŗ¯
)
t
(13.32)
S
S
i
(
t
) = lim
t
ŗ¯
[(1-
b
S
i
(0)+
i
=
1
Similarly, formula (13.29) shows that
S
j
can converge to
S
j
* (the internal
payoff for
R
i
). This kind of analysis can be extended to any rule chain. For
example, when <
R
1
R
2
…
R
n
> are ignited in turn, only rule
R
n
can receive