Geoscience Reference
In-Depth Information
G EO C LEF [GEY 05]) and of the evaluation framework T EMP E VAL [VER 09]. For
example, G EO C LEF 2008 contains articles from different newspapers (169,477
articles in English, 294,809 in German and 210,734 in Portuguese) and 25 topics
(queries integrating spatial and thematic criteria).
Figure 2.4. IRS evaluation campaigns
Other manually annotated reference corpora have been proposed for the
evaluation of toponym resolution tasks. Table 2.2 illustrates a few examples taken
from [AND 10]. Let us note that these corpora are mainly composed of press articles
and that the reference language is English.
TR-CLEF TR-RNW TR-CoNLL TR-MUC4
Reference
[AND 10] [AND 10] [LEI 07] [LEI 07]
Size of corpus (in tokens)
360,559
6,010
204,566
30,051
Number of documents
321
556
946
100
Toponym instances
5,783
2,338
6,980
278
Distinct toponyms
802
432
1,299
135
Ambiguous distinct toponyms
690
332
-
-
Non-ambiguous distinct toponyms
112
102
-
-
Human annotators
2
1
4
2
Table 2.2. Corpora dedicated to the evaluation of toponym resolution tasks
(table created from data presented in [AND 10])
Resources or spatial gazetteers are associated with these evaluation campaigns.
These resources are mainly composed of lists of toponyms and of corresponding
geometries. Numerous complementary pieces of information, such as the type
(feature class) of the toponym, can be associated with the description. These
resources are necessary for the recognition, validation and interpretation of spatial
 
Search WWH ::




Custom Search