Digital Signal Processing Reference
In-Depth Information
3.4.3.4 The RT Procedure
When the RT procedure is not fully defined at RT, it is needed to learn and calibrate
the actions to be taken. However, to enable learning and calibration, we need to have
feedback. As mentioned above, feedback is crucial to develop CR systems. Based on
feedback from the environment, CR systems can explore and learn the behavior and
response of the environment. Based on this behavior and response the underlying
environment models are adapted and new RT procedures can be defined.
Feedback is linguistically defined as [48]:
1. “the process of returning part of the output of a circuit, system, or device to the
input, either to oppose the input (negative feedback) or to aid the input (positive
feedback)”,
2. “a reaction or response to a particular process or activity”,
3. “evaluative information derived from such a reaction or response”.
Rather than monitors that passively observe the environment, feedback gives infor-
mation about the environment through its reaction on an action taken by the wireless
terminal.
Feedback Channel: (F i ) ,1
f . For a given wireless environment, f feed-
back channels can be defined. Rather than passive monitoring, feedback contains
information about how the environment reacts to specific actions taken by the wire-
less terminal. Rather than directly observing the parameters of the RT situations, a
feedback channel observes the effect of the actions taken by the wireless terminal
on the environment.
As mentioned above, the RT learning engine calibrates the template of the DT
procedure. In the example considered in this topic in Chap. 7, reinforcement learning
is used for the calibration.
The objective of the RT calibration is to find the action that yields the most re-
ward. It is interesting to see that learning algorithms learn a procedure, but do not
dictate whether this procedure should be used or not. Exploration beyond the cur-
rent RT procedure is not only allowed, but advised. Traditionally, this exploration
is done blindly. The major problem with blind learning algorithms is the need to
explore all possible actions. As a result these algorithms scale badly.
By allowing the DT procedure to steer the exploration of the algorithm away
from dominated operating points and by defining possible actions, the number of
points that the learning engine needs to traverse can be significantly reduced. Ideally,
the DT procedure presents one point, the optimal, which is the case when the RT
environment is well known and easily modeled (see Chap. 6). However, as discussed
above, for wireless networks this is not feasible anymore.
Reinforcement learning captures the most important aspects of the real prob-
lem facing a learning agent interacting with its environment to achieve a goal [49].
Contrary to supervised learning, which learns from examples provided by a genie,
and costly techniques, such as neural networks or simulated annealing that rely on
the availability of a test set, reinforcement learning learns from interaction with the
environment. In interactive problems it is often impractical to obtain examples of
i
Search WWH ::




Custom Search