Biology Reference
In-Depth Information
9.6 Conclusions
Data aggregation is a common subtask in many contexts and phases of bio-
surveillance. We have reviewed a few examples of its practical use. They were
selected to show that it is sometimes possible to opportunistically structure
the approach to biosurveillance in order to reap substantial benefits from
data aggregation. Specific scenarios involve aggregating evidence across
multiple subsets of the same stream of multidimensional data, or aggregat-
ing it across multiple separate streams of data, as well as across multiple
distinct entities—the subjects of surveillance.
Aggregation can be the pragmatic strategy of choice when dealing with
data that provides little support of useful evidence per individual hypoth-
esis to be tested or per individual entity whose behavior is to be monitored.
Sharing relevant information between similar entities and combining signals
obtained from distinct sources can be used to boost reliability of the models
used in surveillance and to improve statistical significance of the results.
In some applications, such as those involving spatial or temporal scan
algorithms, data aggregation may put a serious burden on computational
resources. One of the methods of mitigating such effects is to cache suffi-
cient statistics of data ahead of the time of analysis. Caching reduces laten-
cies due to information retrieval, and it allows data-intense algorithms to
run faster. To illustrate that we used an example of T-Cube—a data struc-
ture that enables very fast aggregation of multidimensional time series
of counts of events. The efficiencies provided by T-Cube support human
users who benefit from very fast responses to queries and from dynamic
visualizations of data at interactive speeds. Fast querying also enables
large-scale mining of multidimensional categorical aggregates. Moreover,
exhaustive searches for patterns in highly dimensional categorical data
become computationally tractable. That minimizes the risk of missing
something important by having to resort to selective surveillance, and it
maximizes the chances of obtaining useful information right when and
where it is needed.
Acknowledgments
This work was supported in part by the U.S. Department of Agriculture
(award 1040770), Centers of Disease Control and Prevention (award R01-
PH000028), National Science Foundation (grant IIS-0325581), and the
International Development Research Centre of Canada (project 105130).
Many thanks to Andrew Moore, Jeff Schneider, Daniel Neill, Josep Roure,
Maheshkumar Sabhnani, Purnamrita Sarkar, John Ostlund, Lujie Chen,
 
Search WWH ::




Custom Search