Biology Reference
In-Depth Information
of regulatory violations, we note a moderate increase in uncertainty about
the lift estimates when using more specific information. Despite that, the use
of public health-related non-compliances allows to consistently outperform
the less specific alternative at a range of widths of windows of observation.
At the shortest, a seven day observation interval, recording a public health-
related regulatory violation increases the factory chances to record a positive
for Salmonella during the next week 2.75 times on average, with respect to
the model, which ignores the recorded non-compliances. As expected, the
lift values decrease as the lengths of evidence aggregation periods increase.
Observing a regulatory violation over longer periods of time becomes just
more common and hence less useful in estimating the near future risk of
microbial contamination of food. The presented approach provides an
example of how a simple analysis combined with insightful aggregation of
evidence can lead to useful discoveries (Food Safety and Inspection Service
2008).
9.5 Aggregating Evidence Using Graph Representations
In some application scenarios, due to either sparseness of the available data,
infeasibility of independence assumptions, or presence of explicit or implicit
notion of linkages between data elements, it makes practical sense to repre-
sent data objects as linked entities. Representing any data in such way can
be useful in general as it usually allows leveraging a range of the existing
algorithms originally developed for social network analyses.
One example comes from the food safety domain where predicting the
risk of positive outcomes of microbial tests becomes harder when either such
events happen to be naturally rare or if there is a need for a highly specific
event designation (e.g., predicting the occurrence of one of hundreds of spe-
ciic Salmonella serotypes as opposite to Salmonella in general), which substan-
tially reduces the amount of available data per type of event. In such cases, it
may be useful to consider methods that take advantage of either explicit (e.g.,
corporate membership, supplier-receiver relationship) or implicit (e.g., tem-
poral co-occurrence of specific microbial serotypes) linkages between enti-
ties (e.g., food-processing plants) in data to boost predictability of the adverse
events even if they are sparse. The approach takes advantage of similarities
between entities to learn reliable models from evidence aggregated across
multiple entities that are relatively close to each other in the resulting graph
(Sarkar et al. 2008, Dubrawski et al. 2009b, Dubrawski et al. 2007b).
In one application scenario, food production facilities are modeled as one
type of entities in a bi-partite graph that evolves over time. Another type of
entity denotes various specific strains of Salmonella . Two entities are linked
in the graph if a microbial test of food sample conducted at the specific food
 
Search WWH ::




Custom Search