Geography Reference
In-Depth Information
10.6
Evaluation of Combining Spatiotemporal Terms
The methods demonstrated in this research show how extracted spatiotemporal
information can be integrated to represent the dynamics of events described in Web
news reports. A preliminary evaluation of these results has been undertaken based
on 20 news articles, selected as a sample data set for evaluating the methods. Kappa
statistics are applied to provide a standard for which combining spatiotemporal
information associated with events can be compared (Viera and Garrett 2005 ). In
this evaluation, results automatically processed by the system are compared with
results manually processed by humans. For human evaluators, five volunteers were
recruited for evaluating the performance of system. None of the volunteers have
GIR or NLP experience. The volunteers were trained to process spatiotemporal
information from text documents by providing two sample news reports with
instructions. Each volunteer manually annotated spatial and temporal terms from
20 news reports, and combined the annotated spatial and temporal results based on
the context in the text documents. The spatiotemporal results from each volunteer
are compared. An acceptable standard for assessing the results obtained from the
manual test is where either the results by all volunteers agree, or four out of five of
the results agree. All results that correspond to three out of five or two out of five
are required to be rechecked by all volunteers, and results with 0 % agreement are
excluded.
Manually-derived spatiotemporal combinations, and the results obtained from
automatically processing the text using the approach presented in this research,
are each compared with the standard. The number of correctly detected references,
incorrectly detected references, and missed references for the users and the system
are determined. A criterion for the evaluation of spatiotemporal extraction of events
is developed based on adapting traditional precision and recall evaluation metrics
as used in IR (Manning et al. 2008 ). Precision refers to the ratio of the number
of correctly resolved spatiotemporal references and the number of spatiotemporal
references that the system/or users attempts to resolve. Recall is the ratio of the
number of correctly resolved spatiotemporal references and the number of all
references.
￿
Precision D correctly detected references/ (correctly detected references C incor-
rectly detected references)
￿
Recall D correctly detected references/ (correctly detected references C missed
references)
Precision and recall are calculated based on the metrics defined above. Table 10.1
shows the results for precision and recall based on human performance and the
performance results for the algorithms. The average results annotated by human
subjects are shown in this table. These results are based on 126 processed spatiotem-
poral references in the text documents. For this preliminary evaluation, there were an
average of 116 correct references, 10 incorrect references, and 8 missed references.
This results in precision and recall values of 0.92 and 0.94 respectively. For the
Search WWH ::




Custom Search