Biology Reference
In-Depth Information
2.2 Data Simulation
2.2.1 Overview
As noted in Buckeridge et al. (2005), the main challenge is “complexity of
simulating background and outbreak signal,” and in particular,
To allow for meaningful evaluation of diverse algorithms, both normal
and outbreak data must be simulated in a manner that ensures sufficient
complexity and validity in terms of factors such as spatial patterns, tem-
poral patterns, and joint distributions of variables. As a simulation model
grows to meet these requirements, the number of parameters increases,
the ability to verify the model becomes difficult, and ultimately it becomes
more difficult to ensure the validity of the simulated data.
Our approach is thus to identify those features that seem central in authen-
tic data, estimate the appropriate parameters from authentic data, and use
them to stochastically generate new data. In particular, we use the statistical
structure of authentic multivariate time series derived from biosurveillance
data in order to simulate background data that have the same structure. We
can even mimic a particular dataset, thereby generating one or more stochas-
tic duplications of it.
Our method for simulating multivariate time-series data includes several
prominent patterns that have been shown by various empirical studies to
exist in biosurveillance time series. Day-of-week (DOW) is a common pat-
tern. In emergency department visits in the United States, daily counts are
typically lower on weekends and high during the week (Burkom et al., 2007),
but can also exhibit other daily patterns (e.g., Brillman et al., 2005; Reis and
Mandl 2003), or none (Fricker 2006). Grocery stores tend to have more traffic
on weekends, and therefore medication sales appear higher on weekends
(e.g., Goldenberg et al., 2002). Another common pattern is abnormal behavior
on holidays and postholidays (e.g., Fienberg and Shmueli, 2005; Zhang et al.,
2003) due to holiday closings (e.g., schools) or limited operation mode (e.g.,
pharmacies, hospitals). Another pattern exhibited by some series is seasonal
cyclical behavior such as annual or biannual (summer/winter) fluctuations.
The daily frequency of collection also leads to nonnegligible short-term auto-
correlation (see, e.g., Burkom et al., 2007; Lotze et al., 2008). Finally, there are
also dependencies between series that manifest as cross correlations.
Our simulator begins by generating “simple” multivariate time series that
include autocorrelation and cross correlation, and then add to them DOW,
seasonal, and holiday effects.
2.2.2 Creating initial Multivariate Data
We generate a set of initial multivariate data from a multivariate normal dis-
tribution in the following way: a vector of means, a vector of variances, and a
 
Search WWH ::




Custom Search