Biology Reference
In-Depth Information
Automatic
Analysis
Content and Metadata
Storage and To ols
Exploitation
To ols
Source
Preprocessing
Event
Alerter
Content
(RSS)
Machine
Translation
News Search
Interface
Knowledge Repository
Document Links
Annotations
Knowledge
Objects
Markup
(XML)
Topic
Classification
Named Entity
Analysis
Biogeographic
View
Trends
Analyser
Event
Extraction
Ontology
Matching
(en, es, fr, id, ja, ko, ms, ru, th, vi, zh)
BioCaster Ontology
(BCO) in OWL
Alert Detection
Figure 15.3
Data pathway through the BioCaster system.
modules for some languages (English, Japanese, Thai, and Vietnamese),
MT is a cost-effective option for when dedicated modules do not exist.
Use of MT allows the English rule topic to be applied. The resulting target
text inevitably loses signal (e.g., “swine influenza” can appear as the less
preferable “pig influenza” when translated), but in practice, it can often be
understood after further processing in the later stages due to redundancy
of information in the news report. A small amount of tailoring in pattern
rules has been made to accommodate nonstandard terminology such as
“pig influenza.”
After the step of machine translation, the process of topic classification
follows (Doan et al. 2009). In practice, at this stage, the content examined
could concern almost any topic; it is important, then, that the “gatekeeper”
be aware of both major and subtle differences between topics. For example,
it should be relatively easy to spot the vocabulary differences between a
report on a soccer team winning a match and one on an outbreak of dengue
fever. But, this task is not always straightforward when met with reports
about chronic diseases, polio prevention campaigns, and/or developments
in vaccines. For the BioCaster group, it was first necessary to construct an
objective case definition for the topics that the system would allow. This was
done in collaboration between a linguist and a public health expert. In prac-
tice, guidelines were heavily influenced by the WHO IHR annex 2 decision
tree, and more detail was added to address potential ambiguities that might
arise in practice. The guidelines were then used to hand-annotate a gold
standard media corpus containing approximately 1000 classified texts; of
these, one-third were positive on the topic of infectious disease outbreaks,
Search WWH ::




Custom Search