Biology Reference
In-Depth Information
Table 9.1
Recall Scores of the DSNL Model Compared against the
Baseline, Which Only Considers Histories of the Individual
Food Factories Independently of Each Other
Fraction of the
most likely
serotypes
2.5%
5.0%
10.0%
15.0%
20.0%
DSNL
39.6%
53.7%
69.3%
80.4%
86.5%
Baseline
44.7%
52.9%
59.5%
61.9%
65.6%
such results averaged across all considered factories for varying values of the
recall threshold (labeled in the table as the fraction of the most likely sero-
types). The DSNL model result is compared against an alternative that takes
into account only the individual factory histories in making predictions. The
model leverages aggregation of data collected at factories that end up nearby
in the graph significantly outperforms the baseline at recall thresholds of
10% and greater, and it correctly identifies more than 85% components of
the top 20% of the ranking of predicted Salmonella serotypes. The differences
in performance between DSNL and the baseline model observed at recall
thresholds of 5% and lower are not statistically significant. These results
show the benefits of relaxing the assumption of independence between fac-
tories, and they support the idea of combating sparseness of event data with
link-entity based data aggregation.
Another approach uses Activity From Demographics and Links (AFDL)
algorithm (Dubrawski et al. 2007b) in predicting likelihood of positive iso-
lates obtained from microbial testing of food samples collected at food fac-
tories. AFDL is a computationally efficient method for estimating activity
of unlabeled entities in a graph from patterns of connectivity of known
active entities, and from their demographic profiles. The quantitative
connectivity features are extracted from the topology of the graph using
computationally efficient random walk algorithm and appropriate param-
eterization scheme, separately for each node in the graph. Then, each entity
is represented by a data vector that combines together connectivity and
demographic features, and the label of its status, if known. Such data can
then be feed into a classifier for modeling and predicting probabilities of
activity of unlabeled entities, and AFDL uses logistic regression to accom-
plish that.
Originally, AFDL algorithm has been developed for social network analy-
sis and intelligence applications, but it has also been used to support analyses
of food safety data. One possible usage scenario involves predicting positive
outcomes of microbial tests of food samples taken at food factories based on
historical records of microbial test results and on characteristic properties of
these factories. Again, temporal co-occurrences of the same strain of bacteria
 
Search WWH ::




Custom Search