Deriving Specifications for Systems That Are Connected to the Physical World - Formal Methods and Hybrid Real-Time Systems

Information Technology Reference

In-Depth Information

3

Addressing Component Failures

In a critical system -or any system in which it is important to limit the pos-

sible damage to the equipment- all assumptions must be systematically ques-

tioned. Potential faults must be identified and the software must deal with them

appropriately.

It is pointed out in Section 1.4 that it is desirable to layer a specification by

separating the behaviour under different sets of assumptions: the most optimistic

(no faults in external components) through to minimal behaviour which might

involve setting off alarms.

One way to undertake such a division is to treat the separate systems as

different problems and to look at their combination with programming combi-

nators. In the world of “normal design” such decompositions might be standard

and the choice of components be so accepted that one could indeed just use the

techniques presented so far to specify the individual problems.

Computer technology has however developed so fast that many problems fall

into the “radical design” category. We should in any case like to be able to

deduce properties of an overall system. The source of the diculty with which

we have struggled is the continuous time specifications which our applications

have forced us to employ. It is not dicult to describe normal behaviour as in

Section 2; describing fault-tolerant behaviour uses similar notation plus the ideas

in this section. The key issue is how to describe the handover between the normal

and fault-tolerant phases of operation. Our ideas for this will appear elsewhere

but an indication of the approach is given in Section 4.3.

3.1

Faults in the Sluice Gate System

In our treatment of the sluice gate example so far, we have focused on the

situation where all of the (physical) components operate faultlessly. We now

consider what sorts of issues arise when trying to cope with component failure.

In the sluice gate problem, components like sensors can fail; for example, they

can become stuck false or they can become stuck true. Moreover, the motor

could burn out and no longer be able to move the gate when power is applied to

it. Such component failures are faults in the larger system and a useful control

program will limit their impact even if it cannot meet the original requirements.

In [Jac00] this obligation is called the reliability concern . If a faulty component

is detected, the Control Machine should, perhaps, switch off the motor and turn

on an alarm to indicate that the system needs attention from the maintenance

engineer and that the irrigation requirement is no longer being satisfied.

It will become clear that it is more dicult to maintain our isolation from

details of the physical world when we examine fault-tolerance but we will examine

ways in which such considerations can be brought in gradually.

It would be possible to follow the method described above with weaker as-

sumptions about the physical components (and additional requirements with

respect to alarms) but the resulting specification might become opaque because

it would lack structure. One would like to achieve a structure which preserved

Formal Methods and Hybrid Real-Time Systems

Search WWH ::

Custom Search

Home