Introduction - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

Intrinsic evaluations measure the information content of a generated summary, typically by

intrinsic

evaluation

comparing it with human gold-standard summaries. These types of evaluations are concerned with

whether the candidate summary contains the most important information from the source document.

Many of the intrinsic evaluation schemes we will introduce are automated metrics, and as such

it is important to confirm that they correlate with human judgments. A major reason why the

summarization community has been slow to adopt “official” evaluation metrics (compared with,

say, the machine translation community) is precisely owing to conflicting results regarding such

correlations in different domains. Liu and Liu [ 2010 ] is a recent example of work trying to measure

the usefulness of a popular intrinsic evaluation software package (ROUGE, described in Chapter 2 )

on noisy conversational data.

Extrinsic evaluations, on the other hand, measure the usefulness of a summary in aiding

extrinsic

evaluation

some real-world task, such as document classification or reading comprehension. The motivation

for conducting extrinsic evaluations is that summaries are generated for some purpose, and we should

directly evaluate how well they serve that purpose, rather than simply comparing them with other

summaries. However, extrinsic evaluations are typically user studies, which involve a great deal of

human hours in terms of design, recruitment, experiments and analysis. It is therefore common to

regularly employ intrinsic evaluations to speed research and development, while occasionally carrying

out extrinsic evaluations to assess major development milestones.

1.5

TOPIC PREVIEW

In Chapter 2 , we describe popular conversation corpora for summarization and mining research, in-

cluding descriptions of the relevant annotations. We also describe in detail the widely used evaluation

metrics for both text mining generally and automatic summarization particularly.

In Chapter 3 , we introduce mining tasks and methods for conversational data. This includes

topic segmentation and labeling, subjectivity and sentiment detection, dialogue act detection, ex-

traction of conversation structure, and detection of decisions and action items.

In Chapter 4 , we first give a general characterization of the architecture of summarization

systems, then describe how summarizers have been designed for particular conversation modalities.

We also describe attempts at developing summarizers for conversations across modalities, and give

a detailed case study of an abstractive, multi-modal conversation summarizer.

In Chapter 5 , we review our discussion and lay out suggestions for future work in the promising

and still largely unexplored corners of the mining and summarization research space.

Assumptions about Our Readers We have tried to make this topic accessible by providing sufficient

background on each topic, and think that it should be suitable for the graduate student who may

have a background in computer science or linguistics but only minimal exposure to NLP. However,

due to space limitations, we do assume that our readers are at least somewhat familiar with several

topics, including basic probability and machine learning. In Section 1.3 , we have provided pointers

Search WWH ::

Custom Search

Home