Biology Reference
In-Depth Information
of reports. Integration of manual and automated analyses is also given
consideration. Given time and space limitations, the survey was intended
only to be indicative of current technology trends without claiming to be
comprehensive. A number of other systems that perform similar func-
tions, as well as sites that specialize in reporting natural disaster infor-
mation—such as the United Nation's Global Disaster Alert Coordinating
System 1 (GDACS)—also exist. Of particular note are human network sys-
tems. ProMED-mail is an outstanding example that is used as a source
by several of the systems we survey here (Madoff 2004). To provide a
technological context to the following discussion, we briefly detail the
basic methodological processes that a prototypical system will need to
employ below:
1. Data ingestion is the first stage of processing, with sources that orig-
inate from a variety of document types such as e-mails, newswire
reports, business reports and blogs (Web logs). Contents can be for-
matted in standard syntaxes, including: HTML (HyperText Markup
Language), RSS (Really Simple Syndication) feeds, and PDF (Portable
Document Format) documents.
2. Data cleansing is a technologically mundane process; however, it is
vital in practice to both remove unwanted noise from the text (e.g.,
advertisements or links to unrelated news stories) as well as join
together broken sentences.
3. Data triage is applied after the first two stages, and is the stage dur-
ing which the more-or-less clean text is grouped into topic catego-
ries for either trashing or subsequent processing using detailed fact
extraction. Trashing is necessary in the case of documents found that
are clearly outside the task definition. At this stage, redundant infor-
mation (e.g., multiple reports of the same event) is usually detected
through document clustering.
4. Machine translation of the source text may be required during the
data triage stage if the system does not have a native fact extraction
capability in the source language.
5. Fact extraction is used to obtain structured information about an
event—such as the name of the condition, the type of agent, the num-
ber of victims, and the time and the location where an event happened.
In other words, this is the who , what, where, when, and how of an event.
6. Significance scores are calculated using results from available data.
This could be from the data triage stage alone, or may result in con-
junction with fact extraction. High-end systems will use sophisticated
statistical analysis to assign an alerting level to each detected event.
7. Human judgment is key throughout these processes. It is almost always
needed to understand what is abnormal, to discover rare events that
Search WWH ::




Custom Search