Biology Reference
In-Depth Information
MedISys covers the world, with geographic information contained in a
multilingual gazetteer (Atkinson et al. 2008). Data sourcing comes from
Web-crawling on 4,000 news sites in 43 languages, as well as on ProMED-
mail. Additionally, news from commercial aggregators such as Lexis Nexis
is also ingested.
As noted above, MedISys is fully automatic. It is characterized by language-
independent algorithms that employ a range of keyword classification meth-
ods and statistical analysis of trends by threat and country. MedISys uses a
clustering algorithm with an 8-hour window to detect and flag duplicate sto-
ries. After clustering, positive articles are selected based on a set of Boolean
queries. Both standing queries and user-defined queries are used. Automatic
threat alerting is accomplished by searching for anomalies across aggregated
information within the previous week's news reports. Output to the end user
is provided in several formats, including biogeographic mapping, aggregated
graphs, alerting statistics, and a news search interface.
15.2.4 PulS (Helsinki university)
PULS is a non-governmental, working research prototype GHIS operated by
Helsinki University (Steinberger et al. 2008). The system became operational
in 2006, but its origins go farther back to work that was conducted at New York
University (Grishman et al. 2003). In 2007, a close collaboration was formed
between PULS and the MedISys group. This relationship was based on a
loose integration of the two systems over the Internet; PULS provided rela-
tively high-end language understanding and MedISys provided document
sourcing and early stage topic filtering. Currently, PULS operates only on
English-language news; although, there are plans to incorporate French- and
Spanish-language news sources in the near future. In addition to MedISys,
PULS also makes its output available to a variety of other EU organizations,
such as the European Center for Disease Control and Prevention (ECDC)
and the National Public Health Institute in Finland (KTL).
The PULS ontology focuses mainly on agents and conditions with an
extensive terminology harvested and verified from publicly available health
news collections such as ProMED-mail. The total number of agent concepts
is estimated at 1000. Similarly, geographic terms are harvested from online
resources (e.g., CIA World Factbook) that allow global coverage to very fine
levels of granularity for the United States (Central Intelligence Agency 2008).
PULS is predominantly an automatic system that uses a range of language
technologies to extract structured frames of information from news articles.
The structured information is made available to users through a searchable
database table. One unique feature of PULS is its ability to incorporate a
confidence score into the events that it extracts. This is done by measuring
how many candidates could have filled an informational slot (such as for
country/location of event or disease condition) in the event frame. When a
Search WWH ::




Custom Search