Background: Corpora and Evaluation Methods - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

After the creation of the abstract and the relevant linking, annotators were allowed to select

email sentences which they considered important but were not linked to the abstract. Likewise,

they could remove a linked email sentence from their extract if it was considered unimportant

despite being linked to the abstract. This annotation scheme allows researchers to closely investigate

the relationship between extracts and abstracts. The scheme closely follows the methods used by

researchers in the AMI project in annotating their meeting corpus [ Carletta , 2006 ].

Three people annotated each thread. Their annotations had a κ agreement of 0.50 for the

extracted sentences. This compares to a κ statistic of 0.45 in the AMI corpus [ Carletta , 2006 ] for

meeting summarization, and 0.31 in the ICSI corpus [ Janin et al. , 2003 ] for meeting summarization.

A total of 10 recruits were used for the annotation.

Annotators were also asked to label a variety of sentence-level phenomena, including whether

each sentence was subjective. In a second round of annotations, three different annotators were asked

to go through all of the sentences previously labeled as subjective and indicate whether each sentence

was positive , negative , positive-negative ,or other . The definitions for positive and negative subjectivity

mirrored those given by Wilson [ 2008 ] and used for annotating the AMI corpus, mentioned above.

2.1.3 BLOG CORPORA

To our knowledge, there is not a freely available corpus of conversational blog data complete with

annotations for summarization and mining purposes. Perhaps the most widely used blog corpus for

automatic summarization research is the dataset released as part of the Text Analysis Conference

(TAC, formerly known as the Document Understanding Conference, or DUC) 2008 track on

opinion summarization 8 . This dataset consists of blog posts on a variety of given topics. The task

was to automatically summarize opinions on a person, entity or topic by analyzing numerous blog

posts on that topic. For example, one cluster of blog posts related to the company Jiffy Lube and the

task was to summarize what people think of that company. However, the blog posts are not truly

conversational; individual posts do not include comments and the posts do not link or refer to each

other.

We believe it would be of great benefit to the research community to annotate and release a

corpus of blog conversations. This entails more clearly defining summarization and mining tasks for

blog data. In some cases, we may be interested in analyzing how a set of blog comments reflects on,

or expands upon, the initial post. In other cases, we may want to analyze blog conversations much

more widely, by analyzing how bloggers link and respond to one another across blogs.

2.2

EVALUATION METRICS FOR TEXT MINING

In this section we discuss evaluation metrics that are commonly used for a wide variety of text mining

tasks such as summarization, sentiment detection and topic modeling.

8 http://www .nist.gov/tac/2008/summarization/

Search WWH ::

Custom Search

Home