Information Technology Reference
In-Depth Information
Even though agents employ a greedy policy (selecting the action that gives the high-
est sum of Q-values), this “smooth” exploration strategy ensures that all slots are ex-
plored and updated regularly at the start of the application (since values are initiated
randomly), until the sum of Q-values of one group of slots becomes strictly larger than
the rest. In that case we say that the policy has converged and thus exploration has
stopped. The speed of convergence is influenced by the duty cycle, fixed by the user,
and the learning rate, which we empirically chose to be 0 . 1 . A constant learning rate
is in fact desirable in a non-stationary environment to ensure that policies will change
with respect to the most recently received rewards [15].
3R su s
We proceed with the experimental comparison between our (de)synchronization ap-
proach and a fully synchronized state-of-the-art MAC protocol, viz. S-MAC [17]. All
components of the compared networks, such as the routing and CSMA communication
protocols, remain the same. The S-MAC protocol illustrates network performance un-
der synchronized behavior, where all nodes are active at the same time. In other words,
we compare our RL technique to networks with no coordination mechanism, but which
employ some means of time synchronization, the small overhead of which will be ne-
glected for the sake of a clearer exposition. This synchronized approach ensures high
network throughput, but as we will demonstrate in subsection 3.2, it fails at short duty
cycles.
3.1
Experimental Setup
We applied our approach on three networks of different size and topology. In particu-
lar, we investigate two extreme cases where nodes are arranged in a 4-node line (Fig-
ure 3(a)) and a 6-node single-hop mesh topology (Figure 4(a)). The former one requires
nodes to synchronize in order to successfully forward messages to the sink. Intuitively,
if any one node is awake while the others are asleep, that node would not be able to for-
warded its messages to the sink. Conversely, in the mesh topology it is most beneficial
for nodes to fully desynchronize to avoid communication interference with neighboring
nodes. Moreover, the sink is able to communicate with only one node at a time. The
third topology is a 4 by 4 grid (Figure 5(a)) where sensing agents need to both syn-
chronize with some nodes and at the same time desynchronize with others to maximize
throughput and network lifetime. The latter topology clearly illustrates the importance
of combining synchronicity and desynchronicity, as neither one of the two behaviors
alone achieves the global system objectives. Subsection 3.2 will confirm these claims
and will elaborate on the obtained results.
Each of the three networks was run for 3600 seconds in the OMNeT++ simula-
tor [11] and results were averaged over 30 runs. This network runtime was sufficiently
long to eliminate any initial transient effects. To illustrate the performance of the net-
work at high data rates, we set the sampling period of nodes to one message every 10
seconds. For each node the start of this period is at a uniformly random time within
the first frame of the simulation and thereafter messages in that node are periodically
 
Search WWH ::




Custom Search