The Role of Data Aggregation in Public Health and Food Safety Surveillance - Biosurveillance: Methods and Case Studies

Biology Reference

In-Depth Information

A completely developed T-Cube may grow large when data have more than

a few dimensions. But it can be severely pruned (using representational tricks

originally developed for AD-trees) and still fully serve its purpose. Firstly,

there is no need to store any nodes that correspond to time series of all zeros.

Secondly, we can remove all sub-trees starting at the node corresponding to

the most frequent value of the variable to be instantiated at each “vary” node

in the tree diagram. This eliminates large portions of the tree (cf. Figure 9.1),

and still the removed information can be cheaply recomputed on-the-fly with

simple arithmetic operations using the data stored in the remaining nodes.

Additional memory savings can be attained by not developing the tree to

its full depth and instead terminating it at nodes corresponding to attribute-

value combinations that occur less frequently in data with a set of pointers

to the corresponding raw data records. This can reduce the memory require-

ments by another couple of orders of magnitude, with a trade-off in access

time. Those tricks enable T-Cube to fit in memory and still retrieve time series

substantially faster when compared to the current database technology. That

helps in making many data-intensive analytic algorithms practical. It also

enables user-level interactive visualization as well as real-time navigation

through large sets of multidimensional temporal data.

Efficiencies offered by T-Cube enable large-scale mining of multidimen-

sional bio event data. In general, the public health analysts may not know

a priori which subsets of data require their immediate attention. Often,

they resort to selective monitoring driven by intuition and experience, and

therefore, they become exposed to the risk of missing less obvious patterns

occurring outside of the scope of routine surveillance. This can be alleviated

by implementing massive screening through multiple (ideally all possible)

aggregations of data in search of those that contain the most statistically sur-

prising abnormalities. T-Cube makes such tasks feasible when using popular

temporal anomaly detection algorithms such as cumulative sum (Page 1954),

temporal scan (Naus and Wallenstein 2006, Dubrawski et al. 2007a), or spatial

scan (Kulldorff 1997, Kulldorff et al. 2007, Neill and Moore 2004, Neill and

Cooper 2009).

Bi-variate temporal scan and Bayesian spatial scan are the analytic meth-

ods of choice in the Real-Time Biosurveillance Project (RTBP) (Dubrawski

et al. 2009a). They are used for rapid and reliable detection of emerging out-

breaks of diseases manifested in public health data collected in the country

of Sri Lanka and, separately, in Tamil Nadu state of India. The RTBP consists

of an information-gathering component based on mobile handheld devices

and wireless networking, and of the data analysis and visualization com-

ponent based on T-Cube. The latter enables geo-temporal visualization of

syndromic data, navigation through different levels of data aggregation, as

well as prospective and retrospective screening of data for patterns of public

health interest.

Figure 9.2 presents results of an analytic scenario applied to the record

of reportable diseases reported to the Sri Lankan Ministry of Healthcare

Biosurveillance: Methods and Case Studies

Search WWH ::

Custom Search

Home