Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

email-specific summarization techniques. Lam et al. conducted a small user study to gauge the

perceived suitability of their summaries for several tasks: email triage, cleanup and calendaring. An

interesting finding is that all of the user study participants stated that they would have liked action-

oriented email summaries, indicating whether or not the email recipient needed to take some course

of action.

Rambow et al. [ 2004 ] created a sentence extraction approach for email thread summarization

using supervised machine learning. They employ three classes of features: basic features common to

any text, such as sentence length and an average tf.idf score, message features that take into account

that an email thread is divided into multiple messages, such as the position of the message in the

thread, and email features that capture email-specific information, such as a sentence's subject-line

overlap and the number of recipients for an email. Their general finding is that the supplementing

the basic features with email features yields the best overall classification results.

The Rambow system is also interesting for the manner in which the summaries are presented.

The extracted sentences are processed by a module that wraps each sentence in additional text,

conveying information about the sender, the date and the speech act of the sentence. For example,

the following extracted sentence 1 would be converted to sentence 2:

1. Are you sending out upcoming events for this week?

2. In another subthread, on April 12, 2001, Kevin Danquoit wrote: Are you sending out upcoming

events for this week?

This wrapper module has the potential to increase the coherence of the extractive summary,

coherence

which otherwise could suffer from the fact that its concatenated sentences have been removed

from their original contexts. However, the authors did not evaluate the impact of this wrapper

text. The wrapper module can be seen as a nod towards abstractive summarization, since there is

new text describing the email content at a higher level. More precisely, this is a form of hybrid

extractive/abstractive summarization .

hybrid

summa-

rization

Whereas the Rambow et al. system is supervised, Newman and Blitzer [ 2003 ] present an

unsupervised approach for summarizing very long email newsgroup conversations. The approach

rests on first clustering discussion messages by topic and then extracting sentences for each cluster.

Initially, each message belongs to its own cluster, and at each step of the clustering process two clusters

are combined if they are connected by the most similar sentence pair. Once clustering is completed,

sentences are selected from each cluster based on a variety of scores, which include the use of email-

specific features pertaining to the thread structure and quoted text. For example, a sentence from a

particular email message is more likely to be considered important if it is subsequently quoted in other

messages. This exploitation of quoted text is a classic example of email-specific summarization, with

quoted text

the intuition being that sentences that are quoted in subsequent emails are likely to be important.

While Newman and Blitzer focus on newsgroup discussions, Wan and McKeown [ 2004 ]

focus on summarizing another particular type of email discussion, where the conversation represents

a decision-making process. The system works by identifying an issue sentence in the originating

Methods for Mining and Summarizing Text Conversations

Search WWH ::

Custom Search

Home