Biology Reference
In-Depth Information
for the most interesting projections of multidimensional surveillance data,
often encountered in adverse event detection or emerging pattern-tracking
applications. Popular methods used to handle such tasks, such as tempo-
ral or spatial scanning methods (Kulldorff 1997, Kulldorff et al. 2007, Neill
and Moore 2004, Naus and Wallenstein 2006, Dubrawski et al. 2007a, Neill
and Cooper 2009), rely on more or less exhaustive screening of data for
subsets that reveal unusual behaviors or that match tracked patterns, and
on evaluating each subset using aggregate statistics such as, for instance,
contingency tables or marginal distributions. That creates a requirement
for scalable aggregation of data in order to support comprehensive sur-
veillance in a computationally feasible manner. Another practical context,
in which the need for data aggregation is apparent, involves exploiting
corroborating evidence obtained from distinct sources. Aggregating mul-
tiple signals often leads to improved detectability of events, and it can
boost statistical reliability of the involved models and findings.
This chapter discusses the utility of data aggregation and a few compu-
tationally efficient implementations of it using example applications in the
areas of public health and food safety surveillance.
The next section introduces a data structure designed to efficiently rep-
resent large sets of multidimensional event data of the types often encoun-
tered in health surveillance. It is called T-Cube and it is an extension of the
AD-Tree: an in-memory data structure that efficiently caches answers to all
conceivable queries against multidimensional databases of categorical vari-
ables (Moore and Lee 1998). T-Cube extends the idea of AD-Tree toward an
important task of very fast aggregation of multidimensional time series of
counts of events. The attainable efficiencies support human users, who ben-
efit from very fast responses to queries and from dynamic visualizations of
data at interactive speeds. Fast querying also allows for large-scale mining of
multidimensional categorical aggregates. Exhaustive searches for patterns in
highly dimensional categorical data become computationally tractable. That
minimizes the risk of missing something important by having to resort to a
selective mode of surveillance.
Aggregation of evidence across multiple streams of data is the topic of
the subsequent section. Exploiting corroborating evidence from separate
sources has many practical applications. The approach used here as an
illustration employs single-stream anomaly detectors and Fisher's method
of p-value aggregation to construct powerful multi-stream detectors (Roure
et al. 2007).
Section 4 deals with feature aggregation for cross-stream analysis. As the
events of interest become sparser, it is typically more difficult to construct
reliable models of correlations between streams of data. Data aggregation
can sometimes come to the rescue, as in the example task of predicting the
risk of occurrence of Salmonella at a food factory, given its recent history of
sanitary inspections.
Search WWH ::




Custom Search