Information Technology Reference
In-Depth Information
P1 = - 1.05 P2
(1)
P3 = - 1.01 P4
(2)
P5 = - 1.03 P6
(3)
P1 = 3.33 P3
(4)
P1 = - 3.37 P4
(5)
P1 = 2.86 P5
(6)
At times T2 and T3, equations (4), (5) and (6) (and the rest of the potential linear
equations) no longer hold and the model of the normal relationships between the data
readings reduces down to (1), (2) and (3). If these equations do not hold in a future
test data set, this could indicate data corruption or loss, or the manipulation of data by
a malicious attacker. In practice, an approximate model will be fitted to the training
data and its residual computed (e.g. least squares).
An advantage of this technique is that the beliefs that are encapsulated in the in-
variants can be used to form beliefs about the components of the invariants. For ex-
ample, if the power readings at each end of a link between two buses do not satisfy
the linear relationship, then one of the power readings must be at fault. Information
about the range of typical readings and the last known breaker state can then be used
to discover whether there is an error in the power sensor reading or in the breaker
sensor reading. This allows you to connect topology information and power readings
locally and adjust the weights on the input to the state estimator.
A limitation of this approach is that you can only identify incorrect readings by
looking at the relationships of the two candidates with other correct readings. If a sign
reverses on P1, equation (1) will no longer hold, but it will not be known whether this
is because P2 should be negative or P1 positive unless there are further equations
linking P1 and P2 with other readings. These further equations may not always be
available if there is a substantial amount of corruption.
5
Previous Work
5.1
N-Grams
Marc Damashek was one of the first to develop the n-gram technique [3]. His system
has been successfully used it to classify documents independently of errors and lan-
guage. In the application of this technique presented here, the aim has been rather
different, since although the format of the data is ultimately unimportant, the errors
are critical and so Damashek's statistical approach could not be adopted unaltered.
The simplified non-statistical version of Damashek's technique used in these ex-
periments is also similar to Stephanie Forrest's sequence time-delay embedding
(stide) methodology, described in [7] and elsewhere, which was used to track the
behaviour of applications by identifying abnormal sequences of their system calls.
The focus in Forrest's work is on the behaviour of the system, not on the data passed
around it, and there was little need in the context of her work to track down the exact
position of errors and suggest corrections.
Search WWH ::




Custom Search