Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

Figure 4.1: Word cloud representing email discussion.

4.3

SUMMARIZING CONVERSATIONS IN ONE DOMAIN

In the following subsections, we consider each conversation domain in turn and describe work in

that area. For each domain, we first introduce and briefly describe case studies of summarization

systems that have been developed. We then compare and contrast those case studies and use them

as a jumping-off point for a more general discussion of critical issues in that domain.

4.3.1 SUMMARIZING EMAILS

In this section we first introduce existing work on email summarization, highlighting individual

systems and techniques that have been successful and/or influential. We subsequently use those

case studies to further a discussion on inputs and assumptions, measures of informativeness, and

outputs and interfaces for email summarization, comparing and contrasting the systems as we go.

The focus of this section will be almost exclusively on extractive techniques, as the vast majority of

email summarization research has been extractive.

Email Summarization Case Studies Work on email summarization can be divided into summa-

rization of individual emails and summarization of email threads. Muresan et al. [ 2001 ] take the

approach of summarizing individual email messages, first using linguistic techniques to extract noun

phrases and then employing machine learning methods to label the extracted noun phrases as salient

or not. Summarization of individual emails is a useful task for email triage and for displaying incom-

ing emails on small handheld devices, to give two examples. Since we are interested in conversational

data, we will focus here on describing techniques for summarization of entire email threads.

summariz-

ing

threads

Lametal. [ 2002 ] take an approach to email summarization that is a hybrid between single

email summarization and thread summarization. Their system summarizes individual emails but

in a thread-aware manner, so that the summarized email is presented with some context from the

preceding email messages. Messages subsequent to the one being summarized are ignored. The

system also extracts features from the emails, such as dates, people's names and company names,

presented as a list along with the summary text. The summarization component itself is treated as

a black box, with the authors testing several standard summarizers and finding little performance

difference. This stands in contrast with many approaches described below, where researchers explore

Search WWH ::

Custom Search

Home