Databases Reference
In-Depth Information
￿ Datasets that have been annotated or coded have been manually labeled for phenomena of
interest.
￿ Two evaluation metrics widely used for mining tasks are precision/recall/F-score and ROC curves .
￿ ROUGE, Basic Elements and Pyramid are examples of intrinsic summarization evaluation
metrics that measure the information content of a summary.
￿ Extrinsic summarization evaluation metrics measure how useful a summary is for a particular
task.
2.6 FURTHER READING
The NLTK book, available online 12 , has a chapter on “managing linguistic data,” providing infor-
mation on how to create, format and document linguistic resources such as an annotated corpus.
NLTK itself 13 contains many corpora, some of which are annotated.
Mani [ 2001b ] provides a very good overview of summarization evaluation issues. While that
paper predates current evaluation toolkits such as ROUGE and Pyramid, it features a high-level
discussion of evaluation issues that is still relevant today, e.g., in its discussion of intrinsic vs. extrinsic
approaches.
1 2 http://www.nltk.org/book
1 3 http://www.nltk.org/
Search WWH ::




Custom Search