Geoscience Reference
In-Depth Information
campaigns mainly target the recognition and categorization of textual units referring
to the names of people, businesses and places as well as other phrases such as dates,
time, monetary units and percentages, which can be recognized by specific grammar
rules [GRI 96, POI 03].
Chinchor [CHI 97] associates the place NE category with a place name having a
political or geographical connotation (municipality, district, county, region, country
as well as toponyms, hydronyms, oronyms, etc.), generally described in external
resources. Furthermore, in works of research relative to translation, Bauer [BAU 85]
associates proper nouns with six distinct categories, one of which can be linked to
historical periods: praxonyms are historical facts, diseases or cultural events. Textual
documents thus contain many types of temporal information: NEs of type date (more
or less complete calendar information: “summer of 1860”) and NEs of type
praxonym (“Saint-Barthélemy”, “Great Irish Famine”) also having corresponding
calendar periods. We use the calendar term to refer to information determining the
beginning, length, and order, and which may be described by years and their
divisions.
The recognition of NEs, named entity recognition (NER), consists of processing
a stream of words coming from an earlier lexical analysis. A NE detector generally
uses a machine learning or an ad hoc rule-base approach to detect and categorize NEs
[POI 03]. The learning-based NER is done from texts manually labeled by experts:
statistical analysis methods (the texts are considered to be a stream of characters)
allow the construction of generic patterns that can be used on a bigger corpus. The
ad hoc approach is based on lexical patterns constructed manually with the help of
experts: “a proper noun preceded by the preposition at, is potentially a place” is an
example of such a pattern. These patterns are then applied to a corpus.
There are numerous tools for the automatic recognition of NEs: GATE ANNIE 1 ,
LingPipe 2 , OpenCalais 3 , Stanford NER 4 and OpenNLP 5 are qualified as being
generalist and aim at the annotation of several categories of NEs, MetaCarta 6 and
Yahoo!Placemarker 7 target NEs corresponding to places while GuTime 8 and
HeidelTime 9 are dedicated to the recognition of NEs corresponding to dates.
Concerning Brat 10 , it is a graphical environment of manual annotation and edition of
NEs.
1 http://gate.ac.uk/ie/annie.html/.
2 http://alias-i.com/lingpipe/.
3 http://www.opencalais.com/.
4 http://www-nlp.stanford.edu/software/CRF-NER.shtml.
5 http://opennlp.apache.org/.
6 http://www.metacarta.com/.
7 http://developer.yahoo.com/geo/placemaker/; http://developer.yahoo.com/boss/geo/.
8 http://timeml.org/site/tarsqi/modules/gutime/.
9 http://dbs.ifi.uni-heidelberg.de/index.php?id=106/.
10 http://brat.nlplab.org/.
 
Search WWH ::




Custom Search