Distributed Optimization of Local Area Networks - Software Defined Radios

Digital Signal Processing Reference

In-Depth Information

using a soft-max policy, where each node selects an action with a probability given

by the Boltzmann distribution [127]:

e Q(s c ,s n )

T

∀ s ∈ A (s c ) e Q(s c ,s)

p Q (s c ,s n )

=

,

(7.5)

T

where T is the temperature that controls the amount of exploration. For higher val-

ues of T the actions are equiprobable. By annealing the algorithm (cooling it down)

the policy becomes more and more greedy. We use the following annealing scheme,

denoting θ the annealing factor:

T k + 1 ←

θT k .

(7.6)

To further improve network-wide throughput and fairness, we allow transmitters

to tune their power. It has been established that the best response is to always send

at a higher power. This will lead to the Nash equilibrium, where all terminals are

using the maximum power. Hence, we need to give nodes a small incentive to scale

down the power. We do this by introducing a cost for using higher powers:

r (p) (s n ,s c )

=

ρ(i n )S(s n )

−

ρ(i c )S(s c ),

(7.7)

=

where i is the power index ( i

0 refers to the lowest power). The reward with power

rewards is denoted as r (p) . The reward factors are defined as follows:

ρ i :

ρ(i)

=

i

∈[

0 ,n p −

1

]

,

(7.8)

where ρ is an element of ( 0 , 1

]

and n p is the number of available transmission

powers.

With a high ρ , nodes will scale down their power, until they see a drop in through-

put. This is somewhat similar to the power control mechanism described in [94].

Withalower ρ , they will even accept a throughput reduction.

Similar to the heuristic recommendation for starvation-free scenarios (see

Sect. 7.3.4 ), we allow links with a good channel to scale down their power without

dropping their throughput. As a result interference levels drop for the surround-

ing links. These links may now be able to send at a higher rate, which improves

network-wide throughput and fairness.

7.3.6 Seeding the Learning Engine with the DT Procedures

For each (combination of) scenario(s), we have defined heuristic recommendations

in Sect. 7.3.4 . For instance, when a node is dealing with asymmetric starvation, it

makes sense to either increase the power or decrease the carrier sense threshold in

order to alleviate this situation. At DT, we however do not know which of the two

actions is better.

Hence, we need to incorporate the heuristic recommendations in the Q-learning

mechanism, described above. The idea is that the heuristic recommendations are

followed during the exploration phase and that when the temperature cools down,

Software Defined Radios

Search WWH ::

Custom Search

Home