Biology Reference
In-Depth Information
9.3 Aggregation of Evidence across Multiple Streams of Data
Simultaneous monitoring of signals coming from distinct sources, or sur-
veying diverse aspects of data even if it comes from a single source, can yield
improvements in accuracy, sensitivity, specificity, and timeliness of event
detection over more common single-stream analyses. That is possible if the
individual streams contribute corroborating evidence to support determina-
tion of evaluated hypotheses. For instance, in public health domain, patient
data collected at emergency rooms in hospitals can be used to corroborate
epidemiological hypotheses derived by monitoring sales of certain classes of
non-prescription medicines at nearby drugstores.
The best approach to handle multiple streams of such data is to model
them jointly. However, it may not always be feasible in practice, given limited
understanding of complex interactions of data between and within streams,
and given limited amounts of data available for the joint model estimation.
A typical way to overcome complexity is to develop a separate event detector
for each individual stream and then raise an alert whenever either of them
indicates abnormality (or, equivalently, to apply the Minimum operator to the
set of single stream p-values, and to base the alert decision on its result).
Unfortunately, information stemming from between-stream interactions is
not used in this approach.
A useful alternative is the method that probabilistically aggregates p-values
derived from anomaly detectors built for individual streams (Roure et al.
2007). The aggregated p-value represents the consensus estimate of strange-
ness. Since p-values follow a uniform distribution under the null hypothesis,
Fisher statistic (the doubled sum of natural logarithms of the m independent
p-values) has a χ 2 distribution with 2 m degrees of freedom. There exists a
closed form solution for the combined p-value, p F :
m
1
m
1
(ln)
!
k
i
pk
=
,
where
k
=
p
.
F
i
i
i
=
0
i
=
0
Fisher's method is sensitive to situations where component p-values are
just slightly greater than critical. That enables flagging cases in which the
individual streams are of a marginal interest on their own, but they appear
unusual when the corresponding pieces of evidence are combined. On the
other hand, it is more conservative than the Minimum algorithm when either
of the component p-values is substantially greater than critical. Conservative
approach makes sense in many practical situations involving noisy data.
The example result shown in Figure 9.4 considers three streams of inde-
pendently collected food and agriculture safety data involving records of
daily counts of condemned and healthy cattle (Stream A), counts of posi-
tive and negative microbial tests of food samples (Stream B), and counts of
Search WWH ::




Custom Search