Implementing Recovery in Low-Resource Stationary Wireless Sensor Networks (WiMoA 2011 and ICCSEA 2011)

Abstract

Implementation of certain types of protocols on wireless sensor networks (WSNs) can be difficult due to the architecture of the motes. Identification of these issues and recommendations for workarounds can be very useful for the WSN research community testing hardware.

In recent work, the authors developed and implemented clustering, reprogramming and authentication protocols involved in recovering WSNs with low resources. In attempting to integrate these protocols, several issues arose in the implementation phase connected to pre-set configurations of the motes used.

In this paper, we describe the issues arising in integrating our protocols using Zigbee with IEEE 802.15.4 and the reprogramming module Deluge, and compare our challenges and solutions with those faced by other researchers in this area. Finally, we provide recommendations for consideration by researchers who may in future consider integrating several protocols on these platforms.

Keywords: Wireless sensor network, recovery protocol, implementation.

Introduction

Wireless sensor networks (WSNs) are deployed in many mission critical applications and for monitoring of critical areas of national defence. With their development, various attacks have appeared usually with the aim of taking over nodes in the network, destroying nodes or disrupting data flow. Since the base station can easily detect if only a few nodes are sending anomalous data, attacks are usually based on compromise of a large part of the network. Detection, response to and recovery from such attacks have become major challenges in protecting sensor networks.


Numerous papers have been written on aspects of clustering, reprogramming and authentication (eg. [4], [9]) in WSNs, most of which merely simulate a WSN environment. Our focus in this paper is implementation of recovery in an actual WSN system bringing together an integrated protocol involving reprogramming, re-clustering and authentication. The authors have previous papers in this area ([5], [6], [7]). We assume that the network is low resourced but has the capability to detect an attack on the nodes and determine which nodes have been compromised (with high probability). We then introduce protocols which assist the Base Station (BS) in securely reprogramming compromised nodes and assist the WSN to securely self-organize (cluster and re-cluster).

In this article, we discuss the problems encountered in implementing node recovery across two TinyOS platforms TelosB and Mica Z using the standards-based protocols ZigBee and IEEE 802.15.4 along with the off-the-shelf software Deluge [13] for reprogramming. In developing TinyOS, the driver set was not designed to be compatible with the IEEE 802.15.4 standard and since Deluge reprogramming was also designed based on this driver set, Deluge is not compatible with IEEE 802.15.4. We also encountered problems with clock skewing and collision resistance. We compare the challenges we faced, and our solutions to them, with those faced by several other authors encountering implementation difficulties. Finally, we offer recommendations for those implementing recovery on low-resource WSN platforms.

The rest of the paper is outlined as follows. In Section 2, we discuss related work. In Section 3, we summarize our reprogramming, re-clustering and authentication protocols. Section 4 discusses the implementation issues in integrating these protocols. In Section 5, we compare our implementation issues with those of other researchers, and develop recommendations based on this comparison in Section 6.

Background and Related Work

With the aim of introducing a hierarchical structure for improving cluster management, clustering of WSNs was developed. Clusters are subsets of nodes in the WSN managed by a member of the subset referred to as an ‘aggregator’. Wu [15] proposed the enhanced multi-point relay (EMPR) algorithm to efficiently partition a WSN with a flat topology into a hierarchical network by identifying a set of multipoint relays that can be shown to be fully connected. This method is amenable to both stationary and mobile WSN environments and does not require that nodes be able to communicate directly with the base station. We adopted EMPR as the basis of our clustering algorithm and adjusted it to permit control of cluster size and choice of aggregators. A detailed description of our re-clustering approach can be found in [5].

In developing our approach to reprogramming for recovery, we considered the best current work. Current methods for over the air reprogramming include XNP from Crossbow [13], Deluge [13], MOAP [12] and incremental programming [1]. Each of these methods is designed for either multi-hop or single-hop reprogramming of all nodes in a deployment. XNP transmits the program code over a single-hop, Deluge, however, makes use of epidemic dissemination over multi-hops to reprogram all nodes in the network and MOAP also supports multi-hop programming. Incremental programming targets components of the code in the node that need to be changed, and then reprograms using only former or updated versions of these components. None of these methods enables the reprogramming of a specific node and hence none is directly suited for node recovery purposes. As the basis for our design of a reprogramming protocol for individual nodes, we chose Deluge based on the fact that it aims to increase the transmission throughput by using optimization techniques such as adjusting the packet transmission rate and spatial multiplexing. We modified the Deluge protocol to selectively reprogram nodes that have been identified as compromised. Detailed descriptions of the redesigned Deluge protocol as evaluated through field experiments using our test bed can be found in [5].

In a recovery operation, it is critical that messages from the base station requesting re-clustering or reprogramming be authenticated as, otherwise, a denial of service attack can be launched. Our focus is on ensuring that authentication is effective and efficient, with the goal of extending the life of the WSN for as long as possible. A number of authentication methods have been proposed in WSNs such as one-way hash chains (OHC) to authenticate broadcast messages. To tolerate packet loss, multilevel OHCs were introduced in which a higher-level OHC is used to bootstrap low-level OHCs. These methods require time synchronization which is not practical in a low-resource WSN. In using OHCs and other general cryptographic authentication methods in low resource WSNs, problems unique to unicast messages are encountered, for example, generating and storing keys in a highly resource-constrained node.

The TinySec development of Karlof, Sastri and Wagner in 2004 [2] was developed to tackle these problems by providing access control, message authentication, data confidentiality and avoidance of replay attacks. However, TinySec does not specify any key pre-distribution method but assigns a global key to the system, stored always in the same location. Thus, in a WSN deployment, as an attacker need only target a well-known address or area of memory, we did not employ TinySec.

In all schemes using authentication, some kind of key is needed and the question of key storage is critical to maintaining a WSN. In virtually all known schemes for WSNs, keys are stored on nodes and thus require a tamper resistant location or an assumption that an attacker is unable to retrieve the keys.

We now turn to three recent papers which have implemented protocols on low resource motes. The paper [8] implements 27 Crossbow Mica2 motes in a coal mine to detect collapsed areas and to maintain system integrity, especially in case motes are moved or destroyed in a collapse. In [14], the authors implement data delivery mechanisms on MicaZ motes to determine performance limitations in obtaining ‘accurate modeling of wireless signal propagation behavior … and erroneous device behavior’. Finally, the paper [3] discusses the use of Zigbee together with IEEE802.15.4 standards, and points out that they do not support time synchronization to support clusters. They propose a work-around for this, and test on MicaZ motes. We give more details of all three papers in Section 4.

Summary of Component Protocols

In [5], [6] and [7], we presented protocols for reprogramming, re-clustering and authentication as part of the recovery process in a WSN. We assume that the WSN is structured hierarchically with component roles as follows:

Member node – senses and transmits data

Aggregator node (AG) – controls member nodes and senses, collects, aggregates,analyses and transmits data

Base station – controls the system and collects, analyses, transmits and stores data.

As explained in detail in the papers mentioned, to optimize the recovery process, we propose to retain connectivity and maximize flexibility in the network by allowing each node to play the role of either a member or aggregator node as appropriate under the conditions arising, and at the same time, for the entire network to efficiently reorganize itself in order to remain connected. Authentication is implemented both in re-clustering and reprogramming in order to avoid several typical attacks. Aggregator selection is completed in an energy-efficient manner based on a node metric. For the estimation of the node selection metric, each node v broadcasts a HELLO message at a random time instant within a fixed time interval. Each message carries the ID of the transmitting node v, the IDs of all currently known 1-hop neighbors of the transmitting node, and the metric values M that characterize the capabilities of the node v and its neighbors to act as an AG. The metric M(v) of node v is defined as

tmp235_thumb

where e is the available battery energy of the node and emax its maximum value, d is the node degree that should not exceed a pre-defined value dmax, and a is a pre-defined weighting parameter.

Re-clustering

A node v decides to act as a AG if it has never been compromised, and

1. it has a larger metric M(v) than all its 1-hop neighbors and has at least two unconnected neighbors, or if

2. it is in the coverage set formed by its neighbor with the largest metric M.

Reprogramming

Deluge is a reliable data dissemination protocol for large objects, such as program binaries. Together with a boot-loader, Deluge provides a way to reprogram sensor motes in a network. Since Deluge only supports network-wide reprogramming, we modified the dissemination engine of the protocol to individually address sensor nodes. This was done by replacing the AM_BROADCAST_ADDR parameter in the engine with the node-id of the node to be recovered. For reprogramming, we assume that each node in the WSN is within range of the BS, while the converse is not necessarily the case. This is a much stronger assumption than that needed for re-clustering.

Authentication

In each case, a node with IDn contains secret Sn, known only to itself and the BS. R represents a random value (but is in fact the local time obtained from the LocalTime.get() command in TinyOS), M represents a message to reprogram or re-cluster or a request that another node be reprogrammed. H represents a hash function and we assume that n, M and R are the appropriate size for input to H. All message requests to reprogram or to re-cluster between nodes and nodes, or between the BS and nodes, include the node ID of both sender and receiver and a hash of data including the secrets known only to the sender and BS and of the current value for R. This ensures the authenticity of the content and its sender and prevents resending of legitimate messages at a later time by an attacker.

For comparison purposes, we chose two hash functions in implementing the authentication protocol. SHA-1 was chosen as it is popular and widely used in many government standards. Our second choice was the Rabin encryption system [10] adapted for use as a hash function as proposed by Shamir in [11]. While Shamir suggests an adaptation of Rabin’s scheme to what he calls SQUASH, our implementation is constrained by TinyOS requirements and so we use smaller values than those proposed to ensure the security of SQUASH. Such smaller values do not warrant use of the improved SQUASH computations, and so we use Rabin’s scheme as proposed in [10]. The details can be found in [7].

We make the assumption that keys are stored in a tamper resistant location and that an attacker is unable to retrieve them.

Implementation Issues with the Integrated Recovery Protocol MPRH (Multi-Point Relaying with Hash)

In this section, we describe implementation issues with our integrated recovery protocol and the technical issues involved with the integration. This discussion should assist other implementers of WSNs in some of their decisions.

We chose to use MicaZ and TelosB sensor nodes as both are supported by open source TinyOS [13] and IEEE 802.15.4 compliant, a standard in implementing security protocols on hardware motes, while the former is also ZigBee compliant. 802.15.4 defines the physical and MAC layers, whereas ZigBee defines the network and application layers. Long battery life, low cost, small footprint and mesh-networking in supporting communication between large numbers of devices have driven the specifications for 802.15.4 and ZigBee.

TelosB has an 8 MHz TI MSP430 microcontroller processor with 16 KB of instruction memory. The CPU consumes 1.8 mA (at 3 volts) when active, and 5.1uA power when sleeping. The radio is a 2400 MHz to 2483 MHz globally compatible ISM band, delivering up to 250 Kbps of bandwidth on a single shared channel and with a range of up to a few dozen meters. The RFM radio consumes 4.8 mA (at 3 volts) in receive mode, up to 12 mA in transmit mode, and 5^A in sleep mode.

The MicaZ processor has a 4 MHz 8-bit Atmel ATMEGA103 CPU which consumes 5.5 mA (at 3 volts) when active, and about 0.06mA when sleeping. The radio of 916 MHz is considered to be a low-power radio from RFM, delivering up to 40 Kbps bandwidth on a single shared channel with a range of up to a few dozen meters. The RFM radio consumes 4.8 mA (at 3 volts) in receive mode, up to 12 mA in transmit mode, and 5^A in sleep mode.

The main parameters of both motes are given in Table 1. MicaZ only supports 32 bit computation while the 16 bit processor of TelosB supports 64 bit computation.

Table 1. Hardware performance comparison of TelosB and MicaZ motes

Platforms

Processor bits

Integer Computation Bit Support

RAM

ROM

TelosB

16 bits

64 bits


10KB

48KB

MicaZ

8 bits

32 bits

4KB

128KB

Re-clustering

We found that the hardware of our two platforms had an impact on program size after compilation. For the MicaZ platform, the re-clustering algorithm requires about 126KB of RAM making it impossible to run due to the 4KB RAM limitation. However, TelosB consumes only 9.87KB after successful compilation of re-clustering and so can comfortably accommodate the re-clustering algorithm within the 10KB of RAM available.

Reprogramming

As mentioned in Section II, we needed to modify Deluge to reprogram individual sensors. This was done easily by replacing the AM_BROADCAST_ADDR parameter in the engine with the node-id of the node to be recovered. This allowed us to implement the reprogramming protocol effectively.

Authentication

In order to transmit an authentication message using the IEEE 802.15.4 MAC protocol, 64 bit data is required to be fragmented on the sender side (the base station) and re-assembled on the receiver side (at the node) because a payload unit is only 8 bits. Use of either 64-bit SHA-1 code or 64-bit Rabin code consistently resulted in error responses.

We have discussed this problem with developers from the TinyOS community and believe that it could be happening because of a bug in the GCC compiler or could be due to some unidentified hardware constraints. The problem disappeared on both platforms when we reduced our transmitted data size to 32 bit Rabin authentication.

Integration

We encountered several problems during the integration phase Firstly, Deluge reprogramming methods use the original CC2420 driver from TinyOS Official while our re-clustering algorithm was designed based on a MAC protocol which uses a modified CC2420 chip driver in order to incorporate it into the IEEE 802.15.4 standard. In developing TinyOS, the driver set was not designed to be compatible with the IEEE 802.15.4 standard and since Deluge reprogramming was also designed based on this driver set, Deluge is not compatible with IEEE 802.15.4.

In addition, Deluge requires large volumes of RAM and ROM on both our platforms: at least 1.13KB RAM and 31.95KB of ROM on MicaZ; 1.31KB of RAM and 37.40KB of ROM on TelosB.

There are several possible solutions to these problems, including development of a new CC2420 chip driver set based on the IEEE 802.15.4 protocol in the first instance, and revising Deluge to significantly reduce the memory requirements in the second instance. Both of these solutions are outside the scope of this paper and will form the basis of our future work in this area.

We also mention here two issues that arise from the hardware that had an impact on our protocols.

Clock skewing

In implementing the re-clustering algorithm, we encountered time latency due to the drift in the internal mote clocks because of limited processing capacity on both platforms, and also due to network traffic collisions. We were unable to directly deal with time latencies caused by the processing capacity, but covered this problem by allowing larger time intervals than might otherwise be used when dealing with traffic collisions. This process is described under the heading ‘collision resistance’ below.

Collision resistance

Traffic collisions occur when a mote attempts to receive more than one message simultaneously. The Medium Access protocol (MAC) supported by the underlying TinyOS operating on both platforms includes the standard CSMA/CA (Carrier Sense Multiple Access/Collision Avoidance) to manage the network traffic in an optimal way and so reduce the packet loss rate as much as possible.

CSMA/CA is the channel access mechanism used by most wireless LANs; it specifies how the node uses the medium: when to listen, when to transmit etc. In addition to employing the MAC with CMSA/CA, we added a method similar to node beaconing used in [3]. To improve collision avoidance and allow for clock skew caused by the hardware, we set a small random variation for the time interval to prevent multiple nodes from broadcasting messages simultaneously. Table 2 summarizes the capacity of the two platforms to deal with the protocol components.

Table 2. Capabilities of the two platforms ("Yes" means the algorithm is fully supported by a platform; "no" means the algorithm is not supported by the platform)

Algorithms

TelosB

MicaZ

Reasons for "No"

Deluge Reprogramming

Yes

Yes

Authentication protocols

Yes

Yes

Standalone 8bit SHA-1

Yes

Yes

Standalone 8bit Rabin

Yes

Yes

Standalone 16bit SHA-1

Yes

Yes

Standalone 16bit Rabin

Yes

Yes

Standalone 32bit SHA-1

Yes

Yes

Standalone 32bit Rabin

Yes

Yes

Standalone 64bit SHA-1

Yes

No

Limited computing capacity

Standalone 64bit Rabin

Yes

No

Limited computing capacity

Re-clustering

Yes

No

Limited RAM size

Deluge Reprogramming + SHA-1 authentication

Yes

Yes

Deluge Reprogramming + Rabin authentication

Yes

Yes

Re-clustering + Rabin authentication

Yes

No

Limited RAM size

Deluge Reprogramming + re-clustering

No

No

Different sets of CC2420 chip driver

Deluge Reprogramming + authentication + re-clustering

No

No

Limited RAM size and different sets of CC2420 chip driver

Comparison with Implementation Issues of Other Researchers

The paper [8] implements 27 Crossbow Mica2 motes in a coal mine to detect collapsed areas and to maintain system integrity, especially in case motes are moved or destroyed in a collapse. Thus they need to allow for frequent re-verification of neighbor nodes. As the neighbor table method we use is extremely expensive in such a situation, the authors replace this by a beacon method whereby nodes regularly report their existence to the network. In addition, since a collapse initiates a flurry of correspondence within the network, the authors must deal with a serious collision message problem; in fact flooding through the network can result. In order to reduce collisions, they employ two methodologies: randomized forward latency is used to hold back messages from the less important areas of the WSN after a collapse; data aggregation is the second method applied – motes at the edge of a collapse collect messages from the area and aggregate the data in them into a single report message to the base station. Data aggregation is a major component of our own methodology as the AGs of clusters perform this regularly.

In [14], the authors implement data delivery mechanisms on MicaZ motes to determine performance limitations in obtaining ‘accurate modeling of wireless signal propagation behavior … and erroneous device behavior’. The authors point out that changes in the environment can affect WSN performance and so focus on fault-tolerant routing. They use acknowledgment packets when finding traffic routes, despite the extra load on the network, arguing that the trade-off in reliability is worth it. However, their protocols limit acknowledgments to motes close to the destination of the transmitted packet. They also implement wait-time periods during which a mote listens for a response, but after which responses are ignored. Thus, the number of acknowledgements is kept relatively small. In evaluating their protocols, the authors measure the time required for a packet to traverse a WSN. However, this measurement is affected by clock skew – drift in the internal mote clocks. They compensate for this with linear regression methods.

In our work, acknowledgment of packets for the purposes of authentication is necessary. Thus, where authentication is critical, we need all acknowledgments. We dealt with clock skew in our protocols by using time lags in order to allow for receipt of packets.

Finally, the paper [3] discusses the use of Zigbee together with IEEE802.15.4 standards, and points out that they do not support time synchronization for cluster topologies. They propose a work-around for this, based on time divisioning beaconing, and test it on MicaZ motes supported by TinyOS. The protocol allocates a packet sending timing schedule to each AG and its cluster, in order to avoid collisions between packets sent within the cluster. A ‘super-frame’ schedule is also allocated across the WSN. Their protocol aims to align itself with the IEEE 802.14.5 protocol specification which, they point out, does not support a beacon-only approach. Thus, their work heads towards filling a gap in that standard. The authors of [3] identify clock skew as an issue as did the authors of [14].

The time division beacon frame scheduling mechanism in a Zigbee cluster-tree network suggested in [3] is used for a fixed cluster topology and is not compatible with our need to re-cluster a network periodically since the scheduling of traffic relies on a fixed set of parameters in a duty cycle. In order to use this method, the duty cycle parameters would have to be changed to match the new network structure at each re-clustering. This is infeasible in our low-resourced setting.

In summary, the main issues raised in the above papers are clock skewing and collision resistance. In addition, in our work, we cite the lack of support in the motes themselves for proper integration with standards defined by MACs, Zigbee and IEEE 802.15.4, thus making integration across some platforms difficult, if not impossible.

Conclusions and Recommendations

In this paper, we presented an integrated recovery technique which was implemented on both TelosB and MicaZ motes in a low-resource WSN. We described the issues arising in integrating several separate protocols – re-clustering, reprogramming and authentication – using Zigbee with IEEE 802.15.4 and the reprogramming module Deluge, and compared our challenges and solutions with those faced by other researchers in this area.

Based on this analysis and comparison, we now provide recommendations for consideration by researchers who may in future consider integrating several protocols on these platforms.

• The use of data aggregation in order to reduce collisions.

• A reduction in the number of acknowledgment packets where possible.

• Integration of the time division beaconing methods of [3] with the IEEE specifications for routing.

• Recognition of the difficulties in integrating Zigbee and IEEE 802.15.4 across some platforms.

In particular, relevant to the last point, the TinyOS driver set needs to be redesigned so as to incorporate the IEEE 802.15.4 standard.

Next post:

Previous post: