Information Technology Reference
In-Depth Information
the Q-values of slots are updated sufficiently enough to make the policy of the agents
converge. We measured that after 80 iterations (or frames) on average the actions of
agents do not change any more and thus the state of (de)synchronicity has been reached.
In other words, after 800 seconds each node finds the wake-up schedule that improves
message throughput and minimizes communication interference. This duration is suffi-
ciently small compared to the lifetime of the system for a static WSN, which is in the
order of several days up to a couple of years depending on the duty cycle and the hard-
ware characteristics [4]. However, it is still unclear under which conditions convergence
proofs can be brought. Further research is therefore required to better characterize the
convergence criteria.
Despite the improvements that our approach offers over the standard S-MAC proto-
col, we discuss here two shortcomings that need to be addressed. First of all, the duty
cycle set by the user of the system affects all nodes equally. In other words, all nodes
are active for the same amount of time. Depending on their position in the network,
however, nodes require different duration for their active periods. Nodes close to the
sink are subject to heavier traffic load compared to leaf nodes, whose active time need
not be as high. The second shortcoming of our technique concerns the coordination of
actions among active agents. Clearly, being awake at the same time is not sufficient for
two nodes to successfully exchange messages. If two agents on the same routing branch
attempt to transmit at the same slot, their messages will collide. Agents therefore need
to learn not only the time of their active period within a frame, but also when to transmit
and when to listen during that active period.
The above two shortcomings are being addressed in an extension of our algorithm,
which we call DESYDE [9]. The three main differences to the proposed approach are
outlined below:
1. In DESYDE we let agents learn two quality values for each slot, instead of one.
One quality value indicates how beneficial it is for the node to transmit during that
slot, while the other value indicates how good it is to listen for messages. In slots
where it is neither good to transmit nor to listen, the node will turn off its antenna
and enter sleep mode. Thus, each node learns the quality of three actions: transmit ,
listen and sleep , as opposed to only wake-up and sleep .
2. The algorithm in DESYDE differs from the one proposed in this paper also in the
value of the learning rate α . In DESYDE we set this value to 1 , which dramatically
alters the learning behavior of nodes. With α =1 , nodes remember only the most
recently observed feedback signal for each slot and discard old observations. In this
way the behavior of nodes resembles a Win-Stay Lose-Shift strategy [13] where in
our setting agents at each slot repeat the action that was successful at the same slot
in the previous frame and try a different action if it was unsuccessful.
3. The last difference is the action selection method — in DESYDE nodes select at
each slot the action with the highest expected reward, rather than staying awake for
the slots with the highest sum of Q-values. If none of the two quality values are
above 0 for a given slot, the agent selects sleep in that slot in the next frame. In this
way nodes adapt their duty cycle to the traffic load of the network and may wake
up at different slots within a frame, as opposed to holding only one active period.
 
Search WWH ::




Custom Search