Reinforcement Learning for Self-organizing Wake-Up Scheduling inWireless Sensor Networks - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

generated at the same slot every frame. Frames have the same length as the sampling

period and were divided in S = 2000 slots of 5 milliseconds each. The duration of the

slot was chosen such that only one DATA packet can be sent and acknowledged within

that time. All hardware-specific parameters, such as transmission power, bit rate, etc.,

were set according to the data sheet of our radio chip — CC2420 [1]. In addition, we

chose the protocol-specific parameters, such as packet header length and number of

retransmission retries as specified in the IEEE 802.15.4 communication protocol [3].

Since collisions constitute the biggest obstacle in the pursuit of low latency, each

node contends for the channel for a small random duration within a fixed contention

window of 5 slots. To facilitate the throughput of messages at high data rates, we de-

viated from the contention policy of S-MAC that uses the entire active time as a con-

tention window. Instead, in our simulations we fixed the maximum contention window

of S-MAC to 5 slots for a more fair comparison.

We modeled five different events, namely overhearing ( r =0 ), idle listening ( r =0

for each idle slot), successful transmission ( r =1 if ACK received), unsuccessful trans-

mission ( r =0 if no ACK received) and successful reception ( r =1 ). Maximizing the

throughput requires both proper transmission as well as proper reception. Therefore,

we treat the two corresponding rewards equally. Furthermore, most radio chips require

nearly the same energy for sending, receiving (or overhearing) and (idle) listening [5],

making the three rewards equal. We consider these five events to be the most energy

expensive or latency crucial in wireless communication. Additional events were also

modeled, but they were either statistically insignificant (such as busy channel) or al-

ready covered (such as unsuccessful transmissions instead of collisions).

Due to the exponential smoothing nature of the reward update function (cf. subsec-

tion 2.4) the Q-values of slots will be shifted towards the latest reward they receive.

We would expect that the “goodness” of slots will decrease for negative events (e.g.

transmission was not acknowledged), and will increase for successful communication.

Therefore, the feedback agents receive is binary, i.e. r s,e ∈{ 0 , 1 }

, since it carries the

necessary information. Other reward signals were also evaluated, resulting in similar

performance.

3.2

Evaluation

We would like to point out that both S-MAC and our approach are controlled by the

same parameter — the duty cycle, which is fixed by the user of the system. Since the

active time of nodes in both approaches is the same, the energy consumption of the two

protocols is nearly identical. The only difference to S-MAC is that with our approach

nodes learn when to hold their duty cycle within the frame, as opposed to S-MAC,

where all nodes are awake at the beginning of the frame. Therefore, in the following

evaluation we vary the duty cycle of the nodes and monitor the average end-to-end

latency across the different simulation runs.

Figure 3(b) displays an example of the resulting schedule of the line topology (Fig-

ure 3(a)) after the action of each agent converges for 5% duty cycle. The results indicate

that all four nodes have successfully learned to stay awake at the same time in order for

messages to be properly forwarded to the sink. In other words, we observe that all

nodes belong to the same coalition, as suggested in Figure 2. If any one node in the

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home