Biology Reference
In-Depth Information
A completely developed T-Cube may grow large when data have more than
a few dimensions. But it can be severely pruned (using representational tricks
originally developed for AD-trees) and still fully serve its purpose. Firstly,
there is no need to store any nodes that correspond to time series of all zeros.
Secondly, we can remove all sub-trees starting at the node corresponding to
the most frequent value of the variable to be instantiated at each “vary” node
in the tree diagram. This eliminates large portions of the tree (cf. Figure 9.1),
and still the removed information can be cheaply recomputed on-the-fly with
simple arithmetic operations using the data stored in the remaining nodes.
Additional memory savings can be attained by not developing the tree to
its full depth and instead terminating it at nodes corresponding to attribute-
value combinations that occur less frequently in data with a set of pointers
to the corresponding raw data records. This can reduce the memory require-
ments by another couple of orders of magnitude, with a trade-off in access
time. Those tricks enable T-Cube to fit in memory and still retrieve time series
substantially faster when compared to the current database technology. That
helps in making many data-intensive analytic algorithms practical. It also
enables user-level interactive visualization as well as real-time navigation
through large sets of multidimensional temporal data.
Efficiencies offered by T-Cube enable large-scale mining of multidimen-
sional bio event data. In general, the public health analysts may not know
a priori which subsets of data require their immediate attention. Often,
they resort to selective monitoring driven by intuition and experience, and
therefore, they become exposed to the risk of missing less obvious patterns
occurring outside of the scope of routine surveillance. This can be alleviated
by implementing massive screening through multiple (ideally all possible)
aggregations of data in search of those that contain the most statistically sur-
prising abnormalities. T-Cube makes such tasks feasible when using popular
temporal anomaly detection algorithms such as cumulative sum (Page 1954),
temporal scan (Naus and Wallenstein 2006, Dubrawski et al. 2007a), or spatial
scan (Kulldorff 1997, Kulldorff et al. 2007, Neill and Moore 2004, Neill and
Cooper 2009).
Bi-variate temporal scan and Bayesian spatial scan are the analytic meth-
ods of choice in the Real-Time Biosurveillance Project (RTBP) (Dubrawski
et al. 2009a). They are used for rapid and reliable detection of emerging out-
breaks of diseases manifested in public health data collected in the country
of Sri Lanka and, separately, in Tamil Nadu state of India. The RTBP consists
of an information-gathering component based on mobile handheld devices
and wireless networking, and of the data analysis and visualization com-
ponent based on T-Cube. The latter enables geo-temporal visualization of
syndromic data, navigation through different levels of data aggregation, as
well as prospective and retrospective screening of data for patterns of public
health interest.
Figure 9.2 presents results of an analytic scenario applied to the record
of reportable diseases reported to the Sri Lankan Ministry of Healthcare
Search WWH ::




Custom Search