Digital Signal Processing Reference
In-Depth Information
Fig. 7.6 Multi-agent q-learning using simulated annealing is not guaranteed to converge to a Nash
equilibrium. In this example, the Nash equilibrium is (L, D). However, during initial exploration,
when all the actions are equiprobable, player 2 may decide that U is the better choice as it has an
average profit of 8 and D only yields a profit of 5. As it is unaware of the actions taken by player 1,
player 2 fails to notice that player 1 settles on L. Hence, player 2 goes for the safe option where it
cannot be hurt by the exploration of player 1
Ta b l e 7 . 1 Spatial learning:
simulation parameters
(Rate [Mbps], SINR [dB])
Power [mW]
( 9 , 7 . 78 )
250
( 18 , 10 . 79 )
125
( 36 , 18 . 80 )
66.25
( 54 , 24 . 56 )
33.125
7.4 Assessing the Gains
We evaluated the performance of the proposed algorithm through an extensive sim-
ulation study with the network simulator ns-2.29 [129].
All simulations are done using the simple path loss model of ns-2.29 without
shadowing. As can be seen in Table 7.1 , we assumed terminals are capable of trans-
mitting at four discrete rates to make a fair comparison with [98]. We also used
four power levels. For parameters not present in Table 7.1 or Table 7.2 we used the
default values in ns-2.29.
First, we present an illustrative example that demonstrates the benefits of the
different contributions. The scenario is presented in Fig. 7.7 (a). In Fig. 7.7 (b), we
can see that the T CS based on T Rx is too defensive. Link 1 can increase its throughput
with a smarter state space selection. This, however, has a negative impact on link 2.
When using SL without power flexibility, we can quickly converge. When we allow
link 1 to decrease its power, interference for link 2 reduces and it is able to sustain a
higher rate.
Search WWH ::




Custom Search