Reinforcement Learning for Self-organizing Wake-Up Scheduling inWireless Sensor Networks - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

Even though agents employ a greedy policy (selecting the action that gives the high-

est sum of Q-values), this “smooth” exploration strategy ensures that all slots are ex-

plored and updated regularly at the start of the application (since values are initiated

randomly), until the sum of Q-values of one group of slots becomes strictly larger than

the rest. In that case we say that the policy has converged and thus exploration has

stopped. The speed of convergence is influenced by the duty cycle, fixed by the user,

and the learning rate, which we empirically chose to be 0 . 1 . A constant learning rate

is in fact desirable in a non-stationary environment to ensure that policies will change

with respect to the most recently received rewards [15].

3R su s

We proceed with the experimental comparison between our (de)synchronization ap-

proach and a fully synchronized state-of-the-art MAC protocol, viz. S-MAC [17]. All

components of the compared networks, such as the routing and CSMA communication

protocols, remain the same. The S-MAC protocol illustrates network performance un-

der synchronized behavior, where all nodes are active at the same time. In other words,

we compare our RL technique to networks with no coordination mechanism, but which

employ some means of time synchronization, the small overhead of which will be ne-

glected for the sake of a clearer exposition. This synchronized approach ensures high

network throughput, but as we will demonstrate in subsection 3.2, it fails at short duty

cycles.

3.1

Experimental Setup

We applied our approach on three networks of different size and topology. In particu-

lar, we investigate two extreme cases where nodes are arranged in a 4-node line (Fig-

ure 3(a)) and a 6-node single-hop mesh topology (Figure 4(a)). The former one requires

nodes to synchronize in order to successfully forward messages to the sink. Intuitively,

if any one node is awake while the others are asleep, that node would not be able to for-

warded its messages to the sink. Conversely, in the mesh topology it is most beneficial

for nodes to fully desynchronize to avoid communication interference with neighboring

nodes. Moreover, the sink is able to communicate with only one node at a time. The

third topology is a 4 by 4 grid (Figure 5(a)) where sensing agents need to both syn-

chronize with some nodes and at the same time desynchronize with others to maximize

throughput and network lifetime. The latter topology clearly illustrates the importance

of combining synchronicity and desynchronicity, as neither one of the two behaviors

alone achieves the global system objectives. Subsection 3.2 will confirm these claims

and will elaborate on the obtained results.

Each of the three networks was run for 3600 seconds in the OMNeT++ simula-

tor [11] and results were averaged over 30 runs. This network runtime was sufficiently

long to eliminate any initial transient effects. To illustrate the performance of the net-

work at high data rates, we set the sampling period of nodes to one message every 10

seconds. For each node the start of this period is at a uniformly random time within

the first frame of the simulation and thereafter messages in that node are periodically

Search WWH ::

Custom Search

Home