Digital Signal Processing Reference
In-Depth Information
In [123], the authors describe what kind of engineering problems naturally lend
themselves to multi-agent learning. They strongly support the development of multi-
agent learning techniques starting from the application perspective. Also in [120],
the authors present a critical view on the advancement of multi-agent learning tech-
niques. They criticize the quest for convergence to equilibria, as these may be subop-
timal (e.g., the Nash equilibrium in the prisoner's dilemma). They claim researchers
should start from the problem and see how they can improve over state-of-the-art
algorithms using these learning approaches.
In this chapter, we have tried to adopt this vision and have selected one of the sim-
plest multi-agent learning techniques, independent q-learning, and see how we can
use it in an engineering problem. Although convergence of independent q-learning
is not guaranteed, it has been shown that it can outperform those Nash equilibria by
finding mutually beneficial configurations [124].
Nash q-learning is an alternative that is known to converge to a Nash equilibrium.
However, it requires to know the actions taken by all players. As the interference
ranges for IEEE 802.11 are larger than the transmission ranges, this is not possible
in our current scenario. Under these informational constraints, it has been proved
in [125] that, today, no algorithm exists that can converge to the Nash equilibrium.
This makes independent q-learning a viable candidate for the current problem.
In line with Chap. 3, we design DT procedures (heuristics) to speed up RT con-
vergence of the algorithm. Here, the use of these DT procedures can also improve
performance of the steady-state result. The use of heuristics improves the blind
learning methods, through the addition of domain knowledge in the learner. As men-
tioned above, the authors of [120, 123] strongly support this.
7.3 Spatial Learning: Distributed Optimization of IEEE 802.11
Networks
In this section, we explain our control algorithm for distributed optimization of
IEEE 802.11 networks, resulting from an instantiation of the smart radio framework
presented in Chap. 3. First, we introduce the instantiated framework in Sect. 7.3.1
and make the link with Chap. 3. Afterwards, we introduce the control dimensions
in Sect. 7.3.2 . Scenario identification is discussed in Sect. 7.3.3 . Afterwards, we
introduce our DT procedures in Sect. 7.3.4 . The learning engine to calibrate these
DT procedures is discussed in Sect. 7.3.5 . In Sect. 7.3.6 , we explain how the DT
procedures and the learning engine fit together. Finally, we end this section with a
discussion on implementation details in Sect. 7.3.7 .
7.3.1 The General Framework
In this section, we introduce the general framework of our new control algorithm,
Spatial Learning (see Fig. 7.1 ). Similar to the CR cycle in [16], the learner interprets
Search WWH ::




Custom Search