Databases Reference
In-Depth Information
CHAPTER
2
Background: Corpora and
Evaluation Methods
In this chapter we describe some of the conversation datasets that are widely used for summarization
and text mining research. Large collections of possibly annotated documents are called corpora (sing.
corpus ) in NLP and we will use this terminology. We characterize the raw data as well as the available
annotations. Most of the techniques presented in this topic rely on machine learning methods that
need to be trained and tested using such corpora. Subsequently, we detail the evaluation metrics that
are commonly used for summarization and text mining tasks.
2.1 CORPORA AND ANNOTATIONS
In this section, we introduce two meeting corpora and two email corpora, all of which are freely
available. We describe the annotations (or codings ) that are most relevant and useful for summa-
annotation
rization and text mining. When we say that a corpus has been annotated or coded for a particular
task such as summarization, we mean that human judges have manually labeled the data for the
phenomena relevant to that task. For summarization, this typically means identifying the most im-
portant sentences and writing a high-level abstract summary of the document, but we will describe
such annotation schemes in detail momentarily.
At points we refer to the κ statistic for a given set of annotations, which measures agreement
kappa
statistic
between multiple annotators, factoring in the probability of chance agreement [ Carletta , 1996 ]. More
precisely, κ is used to measure agreement between each pair of annotators where the annotators are
making category judgments. In the case of extractive summarization, for example, the category
judgment is whether or not each sentence should be extracted. In the case of opinion mining, to
make another example, the judgment is whether the sentence has a positive, negative or neutral
polarity.
Given two sets of codings representing the category judgments of two annotators, κ is
calculated as
P (A)
P(E)
κ =
,
1
P(E)
where P(A) is the proportion of times the annotators agree with one another and P(E) is the
proportion of agreement that we would expect based purely on chance. When multiple coders are
chance
agreement
carrying out annotations on the same data, we expect some baseline level of agreement just by chance.
 
Search WWH ::




Custom Search