Biology Reference
In-Depth Information
The International Society for Disease Surveillance (ISDS) has recognized
the paucity of data (both authentic and realistic simulations) as an issue that
hinders the field's progression, and is currently working to create a data
repository for publicly available datasets. It recently sponsored a contest
using simulated data, but required all participants to delete the data after
the completion of the contest. There is a serious need for datasets, which
simulation holds promise for alleviating.
The first implementation of wholly simulated biosurveillance data in the form
of daily counts is the publicly available simulated background and outbreak
datasets by Hutwagner et al. (2005). The background series are generated from
a negative-binomial distribution with parameters set such that “means and
standard deviations were based on observed values from national and local
public health systems and biosurveillance surveillance systems. Adjustments
were made for days of the week, holidays, postholiday periods, seasonality,
and trend.” Other research, such as Fricker et al. (2008), has simulated back-
ground data using an additive combination of terms representing level, sea-
sonal and day-of-week effects, and random noise. Our approach, as described
in Section 2.2, is similar in that we set or estimate levels and temporal patterns
from authentic data. However, our approach is more general in that it captures
two key dependence structures: autocorrelation and cross-correlation. In par-
ticular, we include 1-day autocorrelation, which has been shown to be a major
property of biosurveillance daily time series (Burkom et al., 2007) and gener-
ate multivariate rather than univariate data: we generate a set of time series
rather than a single time series at a time. Thus, there can be a dependence
structure between these series (in the form of cross correlations).
More recently, Siddiqi et al. (2007) developed a simulation method based on
linear dynamical systems, also known as Kalman filters. . They model the observed
series as a linear transformation from a series of latent variables, find a stable
linear transformation for those latent variables, and use this transformation to
re-create similar data and to extend it into the future. They modify standard
Kalman filter methods, incrementally adding constraints to create a system
whose linear transformation remains stable (with eigenvalues less than 1).
This method seems very promising, and we recommend using the methods
described here to evaluate its effectiveness at mimicking authentic data.
Finally, we note that to evaluate an algorithm's performance on biosurveil-
lance data, one must be able to simulate outbreak signals within the data. It is
common practice to evaluate algorithms by seeding real biosurveillance data
with simulated outbreak signals (e.g., Burkom et al., 2007; Goldenberg et al.,
2002; Reis and Mandl, 2003; Stoto et al., 2006, and many others). However,
simulating these outbreak signals accurately is even more difficult than sim-
ulating the background biosurveillance data, as known examples of outbreak
signatures in health care seeking behavior are even more difficult to obtain.
We emphasize that generating a realistic multivariate outbreak signal must
be based on epidemiological and other relevant domain knowledge.
Search WWH ::




Custom Search