Safeguarding SCADA Systems with Anomaly Detection - Computer Network Security

Information Technology Reference

In-Depth Information

P1 = - 1.05 P2

(1)

P3 = - 1.01 P4

(2)

P5 = - 1.03 P6

(3)

P1 = 3.33 P3

(4)

P1 = - 3.37 P4

(5)

P1 = 2.86 P5

(6)

At times T2 and T3, equations (4), (5) and (6) (and the rest of the potential linear

equations) no longer hold and the model of the normal relationships between the data

readings reduces down to (1), (2) and (3). If these equations do not hold in a future

test data set, this could indicate data corruption or loss, or the manipulation of data by

a malicious attacker. In practice, an approximate model will be fitted to the training

data and its residual computed (e.g. least squares).

An advantage of this technique is that the beliefs that are encapsulated in the in-

variants can be used to form beliefs about the components of the invariants. For ex-

ample, if the power readings at each end of a link between two buses do not satisfy

the linear relationship, then one of the power readings must be at fault. Information

about the range of typical readings and the last known breaker state can then be used

to discover whether there is an error in the power sensor reading or in the breaker

sensor reading. This allows you to connect topology information and power readings

locally and adjust the weights on the input to the state estimator.

A limitation of this approach is that you can only identify incorrect readings by

looking at the relationships of the two candidates with other correct readings. If a sign

reverses on P1, equation (1) will no longer hold, but it will not be known whether this

is because P2 should be negative or P1 positive unless there are further equations

linking P1 and P2 with other readings. These further equations may not always be

available if there is a substantial amount of corruption.

5

Previous Work

5.1

N-Grams

Marc Damashek was one of the first to develop the n-gram technique [3]. His system

has been successfully used it to classify documents independently of errors and lan-

guage. In the application of this technique presented here, the aim has been rather

different, since although the format of the data is ultimately unimportant, the errors

are critical and so Damashek's statistical approach could not be adopted unaltered.

The simplified non-statistical version of Damashek's technique used in these ex-

periments is also similar to Stephanie Forrest's sequence time-delay embedding

(stide) methodology, described in [7] and elsewhere, which was used to track the

behaviour of applications by identifying abnormal sequences of their system calls.

The focus in Forrest's work is on the behaviour of the system, not on the data passed

around it, and there was little need in the context of her work to track down the exact

position of errors and suggest corrections.

Computer Network Security

Search WWH ::

Custom Search

Home