Coexistence: TheWhole Is Greater than the Sum of Its Parts - Software Defined Radios

Digital Signal Processing Reference

In-Depth Information

boundary detection and appearance of IEEE 802.11 interference. For both scenar-

ios, the heuristic recommendations increase the exploration rate of the engine.

5.6.2 Learning Engine

In this chapter, we consider a simple Auto Regressive (AR)-filter. Contrary to Q-

learning (see Chap. 7 and [67]), the AR-filter maintains a one-dimensional table

rather than a two-dimensional one. This has no impact on the performance in the

current context as the reward function is independent of the current state of the

terminal.

Our goal in this section is to learn the reward of selecting a certain channel. To

do so, we estimate the reward function, R w (f ) , presented in Sect. 5.5 . This is done

by updating its current estimate using the following rule:

R (k + 1 )

w

α)R (k)

w

(f )

=

( 1

−

(f )

+

αR w (f ),

(5.5)

where R (k w (f ) is the estimate of the reward function at frequency f at interval k ,

α is the learning parameter and R w (f ) is the current evaluation of the reward func-

tion.

It is important to note that the AR-filter updates the estimates, but in fact does

not specify what actions should be taken. Instead of greedily optimizing this re-

ward function, arbitrary experimentation is allowed. This is an important property

in a time varying environment and allows decoupling the learning phase from the

decision policy.

5.6.3 Exploration Algorithms

The algorithms presented in Sect. 5.5 suffer from hardware complexity and extra

energy cost due to out-of-band scanning. In this section, out-of-band scanning is

not performed any longer. Hence, the candidate set,

C i , covers all IEEE 802.15.4

channels. Differentiation between the channels is based on the observed history,

rather than instantaneous observations.

The algorithms presented in Sect. 5.5 also suffer from over-exploration when

steady-state has reached, as no cooling scheme is defined. As mentioned above, the

goal of simulated annealing is to balance exploitation and exploration by defining a

cooling scheme to define the value and evolution of T in Eq. 5.4 . When T is high, the

channel is selected randomly. However, as we lower T according to some cooling

scheme, exploration is reduced and eventually we converge to a greedy selection.

This means that the DT procedure used in this chapter is the simple RFS.

However, we show in Sect. 5.6.3.2 , why a blind learner will perform very subop-

timally in the present context and how the framework presented in Chap. 3 can be

used to overcome this problem. Hence, in the present context DT knowledge is not

Software Defined Radios

Search WWH ::

Custom Search

Home