Information Technology Reference
In-Depth Information
(13.29)
The difference between current intensity and equilibrium intensity is reduced
by factor
b
every time the intensity is changed. For example, assume that
p
* =
500 b
= 0.1 and
S i (
t
)= 100, then
S i (
t
+1) = 100- 10 + 50 = 140. Note that the
error
S i is reduced from 400 to 360; that is, this error is reduced by 10%.
We can find that under the case of constant reward, each rule's intensity can
quickly converge to equilibrium intensity, and reward can be evaluated when the
plot is over. A possible restriction of PSP is that credit must be assigned in the
interval corresponding to the plot differentiated by exterior reward. It is very
important to select such a plot.
Suppose that rule
p
*-
R i is ignited at step τ while rule
R j at step τ+ 1. Then BBA
uses the following formula to modify the intensity
S i of rule
R i :
j (τ) (13.30)
Except that plot index t is replaced with step index τ and exterior reward p ( t )
is replaced with the intensity
S i (τ+1) =
S i (τ) -
bS i (τ) +
bS
R j , this formula is the same as (13.26). The
first change means that the number of modifying rule's intensity in a given plot is
larger than one. The second modification leads to the basic difference between
PSP and BBA. Consider two pieces of rule
S j of rule
R i and
R j . Rule
R i
is ignited after rule
R j . Assume that
R
i and
R
j are ignited in a plot no more than one time, then we
have:
t
à ( i - b ) t-i
*
) t
*
] =
*
S
= lim t ŗ¯ [(1-
b
S
i (0)+
b
p
p
i
i =
1
t
à b (1- b ) t-i S j ( i -1)
) t
S i (
t
) = (1-
b
S
i (0) +
(13.31)
i =
1
where the range of
t
is the whole plot and the two activity. In other words, the
intensity of
R
i follows that of
R
j . If
S j can converge to a constant
S j *, then
S i can
also converge.
t
à (1- b ) t-i S j ( i -1)] = S j *
i * = lim t ŗ¯
) t
(13.32)
S
S
i (
t
) = lim t ŗ¯ [(1-
b
S i (0)+
i =
1
Similarly, formula (13.29) shows that
S j can converge to
S j * (the internal
payoff for
R i ). This kind of analysis can be extended to any rule chain. For
example, when <
R 1 R 2 R n > are ignited in turn, only rule
R n can receive
Search WWH ::




Custom Search