Databases Reference
In-Depth Information
Datasets that have been
annotated
or
coded
have been manually labeled for phenomena of
interest.
Two evaluation metrics widely used for mining tasks are
precision/recall/F-score
and
ROC curves
.
ROUGE, Basic Elements and Pyramid are examples of
intrinsic
summarization evaluation
metrics that measure the information content of a summary.
Extrinsic
summarization evaluation metrics measure how useful a summary is for a particular
task.
2.6 FURTHER READING
The NLTK book, available online
12
, has a chapter on “managing linguistic data,” providing infor-
mation on how to create, format and document linguistic resources such as an annotated corpus.
NLTK itself
13
contains many corpora, some of which are annotated.
Mani
[
2001b
] provides a very good overview of summarization evaluation issues. While that
paper predates current evaluation toolkits such as ROUGE and Pyramid, it features a high-level
discussion of evaluation issues that is still relevant today, e.g., in its discussion of intrinsic vs. extrinsic
approaches.
Search WWH ::
Custom Search