Databases Reference
In-Depth Information
Figure 4.1: Word cloud representing email discussion.
4.3
SUMMARIZING CONVERSATIONS IN ONE DOMAIN
In the following subsections, we consider each conversation domain in turn and describe work in
that area. For each domain, we first introduce and briefly describe case studies of summarization
systems that have been developed. We then compare and contrast those case studies and use them
as a jumping-off point for a more general discussion of critical issues in that domain.
4.3.1 SUMMARIZING EMAILS
In this section we first introduce existing work on email summarization, highlighting individual
systems and techniques that have been successful and/or influential. We subsequently use those
case studies to further a discussion on inputs and assumptions, measures of informativeness, and
outputs and interfaces for email summarization, comparing and contrasting the systems as we go.
The focus of this section will be almost exclusively on extractive techniques, as the vast majority of
email summarization research has been extractive.
Email Summarization Case Studies Work on email summarization can be divided into summa-
rization of individual emails and summarization of email threads. Muresan et al. [ 2001 ] take the
approach of summarizing individual email messages, first using linguistic techniques to extract noun
phrases and then employing machine learning methods to label the extracted noun phrases as salient
or not. Summarization of individual emails is a useful task for email triage and for displaying incom-
ing emails on small handheld devices, to give two examples. Since we are interested in conversational
data, we will focus here on describing techniques for summarization of entire email threads.
summariz-
ing
threads
Lametal. [ 2002 ] take an approach to email summarization that is a hybrid between single
email summarization and thread summarization. Their system summarizes individual emails but
in a thread-aware manner, so that the summarized email is presented with some context from the
preceding email messages. Messages subsequent to the one being summarized are ignored. The
system also extracts features from the emails, such as dates, people's names and company names,
presented as a list along with the summary text. The summarization component itself is treated as
a black box, with the authors testing several standard summarizers and finding little performance
difference. This stands in contrast with many approaches described below, where researchers explore
Search WWH ::




Custom Search