Digital Signal Processing Reference
In-Depth Information
boundary detection and appearance of IEEE 802.11 interference. For both scenar-
ios, the heuristic recommendations increase the exploration rate of the engine.
5.6.2 Learning Engine
In this chapter, we consider a simple Auto Regressive (AR)-filter. Contrary to Q-
learning (see Chap. 7 and [67]), the AR-filter maintains a one-dimensional table
rather than a two-dimensional one. This has no impact on the performance in the
current context as the reward function is independent of the current state of the
terminal.
Our goal in this section is to learn the reward of selecting a certain channel. To
do so, we estimate the reward function, R w (f ) , presented in Sect. 5.5 . This is done
by updating its current estimate using the following rule:
R (k + 1 )
w
α)R (k)
w
(f )
=
( 1
(f )
+
αR w (f ),
(5.5)
where R (k w (f ) is the estimate of the reward function at frequency f at interval k ,
α is the learning parameter and R w (f ) is the current evaluation of the reward func-
tion.
It is important to note that the AR-filter updates the estimates, but in fact does
not specify what actions should be taken. Instead of greedily optimizing this re-
ward function, arbitrary experimentation is allowed. This is an important property
in a time varying environment and allows decoupling the learning phase from the
decision policy.
5.6.3 Exploration Algorithms
The algorithms presented in Sect. 5.5 suffer from hardware complexity and extra
energy cost due to out-of-band scanning. In this section, out-of-band scanning is
not performed any longer. Hence, the candidate set,
C i , covers all IEEE 802.15.4
channels. Differentiation between the channels is based on the observed history,
rather than instantaneous observations.
The algorithms presented in Sect. 5.5 also suffer from over-exploration when
steady-state has reached, as no cooling scheme is defined. As mentioned above, the
goal of simulated annealing is to balance exploitation and exploration by defining a
cooling scheme to define the value and evolution of T in Eq. 5.4 . When T is high, the
channel is selected randomly. However, as we lower T according to some cooling
scheme, exploration is reduced and eventually we converge to a greedy selection.
This means that the DT procedure used in this chapter is the simple RFS.
However, we show in Sect. 5.6.3.2 , why a blind learner will perform very subop-
timally in the present context and how the framework presented in Chap. 3 can be
used to overcome this problem. Hence, in the present context DT knowledge is not
Search WWH ::




Custom Search