Cognitive Radio Design and Operation: Mastering the Complexity in a Systematic Way - Software Defined Radios

Digital Signal Processing Reference

In-Depth Information

3.4.3.4 The RT Procedure ④

When the RT procedure is not fully defined at RT, it is needed to learn and calibrate

the actions to be taken. However, to enable learning and calibration, we need to have

feedback. As mentioned above, feedback is crucial to develop CR systems. Based on

feedback from the environment, CR systems can explore and learn the behavior and

response of the environment. Based on this behavior and response the underlying

environment models are adapted and new RT procedures can be defined.

Feedback is linguistically defined as [48]:

1. “the process of returning part of the output of a circuit, system, or device to the

input, either to oppose the input (negative feedback) or to aid the input (positive

feedback)”,

2. “a reaction or response to a particular process or activity”,

3. “evaluative information derived from such a reaction or response”.

Rather than monitors that passively observe the environment, feedback gives infor-

mation about the environment through its reaction on an action taken by the wireless

terminal.

Feedback Channel: (F i ) ,1

≤

f . For a given wireless environment, f feed-

back channels can be defined. Rather than passive monitoring, feedback contains

information about how the environment reacts to specific actions taken by the wire-

less terminal. Rather than directly observing the parameters of the RT situations, a

feedback channel observes the effect of the actions taken by the wireless terminal

on the environment.

As mentioned above, the RT learning engine calibrates the template of the DT

procedure. In the example considered in this topic in Chap. 7, reinforcement learning

is used for the calibration.

The objective of the RT calibration is to find the action that yields the most re-

ward. It is interesting to see that learning algorithms learn a procedure, but do not

dictate whether this procedure should be used or not. Exploration beyond the cur-

rent RT procedure is not only allowed, but advised. Traditionally, this exploration

is done blindly. The major problem with blind learning algorithms is the need to

explore all possible actions. As a result these algorithms scale badly.

By allowing the DT procedure to steer the exploration of the algorithm away

from dominated operating points and by defining possible actions, the number of

points that the learning engine needs to traverse can be significantly reduced. Ideally,

the DT procedure presents one point, the optimal, which is the case when the RT

environment is well known and easily modeled (see Chap. 6). However, as discussed

above, for wireless networks this is not feasible anymore.

Reinforcement learning captures the most important aspects of the real prob-

lem facing a learning agent interacting with its environment to achieve a goal [49].

Contrary to supervised learning, which learns from examples provided by a genie,

and costly techniques, such as neural networks or simulated annealing that rely on

the availability of a test set, reinforcement learning learns from interaction with the

environment. In interactive problems it is often impractical to obtain examples of

i

Software Defined Radios

Search WWH ::

Custom Search

Home