The Role of Data Aggregation in Public Health and Food Safety Surveillance - Biosurveillance: Methods and Case Studies

Biology Reference

In-Depth Information

for the most interesting projections of multidimensional surveillance data,

often encountered in adverse event detection or emerging pattern-tracking

applications. Popular methods used to handle such tasks, such as tempo-

ral or spatial scanning methods (Kulldorff 1997, Kulldorff et al. 2007, Neill

and Moore 2004, Naus and Wallenstein 2006, Dubrawski et al. 2007a, Neill

and Cooper 2009), rely on more or less exhaustive screening of data for

subsets that reveal unusual behaviors or that match tracked patterns, and

on evaluating each subset using aggregate statistics such as, for instance,

contingency tables or marginal distributions. That creates a requirement

for scalable aggregation of data in order to support comprehensive sur-

veillance in a computationally feasible manner. Another practical context,

in which the need for data aggregation is apparent, involves exploiting

corroborating evidence obtained from distinct sources. Aggregating mul-

tiple signals often leads to improved detectability of events, and it can

boost statistical reliability of the involved models and findings.

This chapter discusses the utility of data aggregation and a few compu-

tationally efficient implementations of it using example applications in the

areas of public health and food safety surveillance.

The next section introduces a data structure designed to efficiently rep-

resent large sets of multidimensional event data of the types often encoun-

tered in health surveillance. It is called T-Cube and it is an extension of the

AD-Tree: an in-memory data structure that efficiently caches answers to all

conceivable queries against multidimensional databases of categorical vari-

ables (Moore and Lee 1998). T-Cube extends the idea of AD-Tree toward an

important task of very fast aggregation of multidimensional time series of

counts of events. The attainable efficiencies support human users, who ben-

efit from very fast responses to queries and from dynamic visualizations of

data at interactive speeds. Fast querying also allows for large-scale mining of

multidimensional categorical aggregates. Exhaustive searches for patterns in

highly dimensional categorical data become computationally tractable. That

minimizes the risk of missing something important by having to resort to a

selective mode of surveillance.

Aggregation of evidence across multiple streams of data is the topic of

the subsequent section. Exploiting corroborating evidence from separate

sources has many practical applications. The approach used here as an

illustration employs single-stream anomaly detectors and Fisher's method

of p-value aggregation to construct powerful multi-stream detectors (Roure

et al. 2007).

Section 4 deals with feature aggregation for cross-stream analysis. As the

events of interest become sparser, it is typically more difficult to construct

reliable models of correlations between streams of data. Data aggregation

can sometimes come to the rescue, as in the example task of predicting the

risk of occurrence of Salmonella at a food factory, given its recent history of

sanitary inspections.

Biosurveillance: Methods and Case Studies

Search WWH ::

Custom Search

Home