Biology Reference
In-Depth Information
the simulated outbreak signature. Before inserting it into the mimicked data-
set, we first add the explainable effects to the outbreak data.
Now the simulated outbreak signature (that includes the explainable pat-
terns) is inserted into the mimicked data, and labels are applied to each day
according to the labeling scheme described in the previous section. The end
result can be seen in Figure 2.13.
This entire process can then be repeated, inserting outbreaks on differ-
ent days, to create multiple datasets with different outbreak locations. This
provides a large number of example datasets with similar background data
and outbreak type, but which are stochastically different. Thus, it allows a
researcher to run an algorithm many times and summarize the results, esti-
mating an algorithm's average performance in terms of false alerts and out-
breaks detected.
2.7 Summary and Future Work
2.7.1 Future Work
There are several potential improvements that could be made to the mimic
methodology. We anticipate that adding more lags to the mimic will increase
the accuracy of patterns captured. While most of the autocorrelation is cap-
tured by a single day lag, additional lags still hold higher-level information
about the series. In addition, a more elaborate spline fitting to estimate sea-
sonal components would be valuable and could potentially allow for exten-
sion of mimics to longer series.
An alternative method for simulating health data is to simulate individual-
level activities within a city (such as visiting an ED or purchasing medication).
This was proposed and implemented in WSARE 3.0 (Wong et al., 2003). These
simulated individual-level events could then be aggregated to the level of bio-
surveillance health series of the type examined here. Alternatively, one could
also modify WSARE's Bayesian method, using the conditional probabilities
of case given combinations of characteristics as sufficient statistics from orig-
inal health data, as another testable way to generate simulated series.
The evaluation tests considered here are unable to detect certain types
of deviations between the authentic and mimicked datasets. For example,
since the temporal factor is not considered, they will be unable to find dif-
ferences in autocorrelation and other time-related deviations. For example, if
all Saturday values were randomly reordered, the test results would be iden-
tical. Similarly, if the daily observations were reordered to have the same
marginal distribution, but a different autocorrelation, this ordering would
not cause a change in the test results. In addition, these tests will not find
cases where the simulated data is too close to the original, such as when there
 
Search WWH ::




Custom Search