Information Technology Reference
In-Depth Information
Ta b l e 1 . Results for query Justin Bieber
Ta b l e 2 . Results for query Fukushima
I II III IV V
I #136 75.64 78.95 78.48 85.37
II 69.01 #143 93.97 86.00 97.17
III 76.71 97.52 #172 92.86 96.09
IV 74.70 89.19 88.52 #196 95.10
V 67.77 79.61 80.66 81.13 #157
I II III IV V
I #121 81.03 83.61 81.35 87.5
II 80.26 #129 93.46 87.36 98.48
III 85.00 94.59 #131 91.67 92.22
IV 74.65 89.13 85.26 #178 91.58
V 72.93 80.04 83.19 82.26 #132
Ta b l e 3 . Results for query New York
Ta b l e 4 . Summary for NER Evaluation
I II III IV V
I #175 81.39 88.24 85.15 71.05
II 76.60 #169 93.53 86.51 74.36
III 90.00 95.79 #280 92.35 73.28
IV 84.43 92.72 93.17 #230 83.49
V 81.11 83.90 73.77 79.87 #166
I II III IV V
I #432 79,25 83.6 81.66 81.31
II 75.29 #441 93.65 86.62 90.00
III 83.90 95.97 #583 92.29 87.19
IV 83.90 95.97 583 #604 87.19
V 73.94 81.18 79.21 81.09 #455
The tables 1, 2, and 3 show the main results for the three different corpora; ta-
ble 4 shows the results summarised. All numbers denote percentages that show how
many relevant 6 NEs of the algorithm in the row could be extracted by the algorithm
in the column. For example, in the dataset “Justin Bieber” TEP extracted 85.37% of
the NEs which have been extracted by SProUT . AlchemyAPI extracted 75.64% and
StanfordNER extracted 78.95% of the NEs that have been extracted by SProUT .The
numbers with preceding “#” show the number of extracted NEs. The following ro-
man numbers are used to denote the different algorithms: I= SProUT , II= AlchemyAPI ,
III= StanfordNER ,IV= OpenNLP ,andV= TEP .
Keeping in mind that our approach always starts with a topic around which all the
NEs are grouped, i.e. NE recognition is biased or directed, it is hard to define a gold
standard, i.e. manually annotate all NEs which are important in a specific context. In
context of the query “Fukushima” most people would agree that word groups describing
the nuclear power plant disaster clearly are NEs. Some would also agree that terms like
“earthquake” or “tsunami” function as NEs too in this specific context. Given a query
like “New York” people probably would not agree that “earthquake” should function as
a specific term in this context. Of course there are NEs of generic type like “persons”,
“locations”, or “companies”, but it is questionable whether they suffice in the context
of our task.
Hence we compared the systems directly with the results they computed. The main
interest in our evaluation was whether the extracted NEs by one algorithm can also
be extracted by the other algorithms. Furthermore, we set a very simple rating scheme
telling us that detected NEs with more occurences are more important than those with
lower frequencies. 7
6
Relevance here means that a NE must occur more than 4 times in the whole dataset. The value
has been experimentally determined.
7
Except for the TEP , where we used the PMI as described above.
 
Search WWH ::




Custom Search