Databases Reference
In-Depth Information
email-specific summarization techniques. Lam et al. conducted a small user study to gauge the
perceived suitability of their summaries for several tasks: email triage, cleanup and calendaring. An
interesting finding is that all of the user study participants stated that they would have liked action-
oriented email summaries, indicating whether or not the email recipient needed to take some course
of action.
Rambow et al. [ 2004 ] created a sentence extraction approach for email thread summarization
using supervised machine learning. They employ three classes of features: basic features common to
any text, such as sentence length and an average tf.idf score, message features that take into account
that an email thread is divided into multiple messages, such as the position of the message in the
thread, and email features that capture email-specific information, such as a sentence's subject-line
overlap and the number of recipients for an email. Their general finding is that the supplementing
the basic features with email features yields the best overall classification results.
The Rambow system is also interesting for the manner in which the summaries are presented.
The extracted sentences are processed by a module that wraps each sentence in additional text,
conveying information about the sender, the date and the speech act of the sentence. For example,
the following extracted sentence 1 would be converted to sentence 2:
1. Are you sending out upcoming events for this week?
2. In another subthread, on April 12, 2001, Kevin Danquoit wrote: Are you sending out upcoming
events for this week?
This wrapper module has the potential to increase the coherence of the extractive summary,
coherence
which otherwise could suffer from the fact that its concatenated sentences have been removed
from their original contexts. However, the authors did not evaluate the impact of this wrapper
text. The wrapper module can be seen as a nod towards abstractive summarization, since there is
new text describing the email content at a higher level. More precisely, this is a form of hybrid
extractive/abstractive summarization .
hybrid
summa-
rization
Whereas the Rambow et al. system is supervised, Newman and Blitzer [ 2003 ] present an
unsupervised approach for summarizing very long email newsgroup conversations. The approach
rests on first clustering discussion messages by topic and then extracting sentences for each cluster.
Initially, each message belongs to its own cluster, and at each step of the clustering process two clusters
are combined if they are connected by the most similar sentence pair. Once clustering is completed,
sentences are selected from each cluster based on a variety of scores, which include the use of email-
specific features pertaining to the thread structure and quoted text. For example, a sentence from a
particular email message is more likely to be considered important if it is subsequently quoted in other
messages. This exploitation of quoted text is a classic example of email-specific summarization, with
quoted text
the intuition being that sentences that are quoted in subsequent emails are likely to be important.
While Newman and Blitzer focus on newsgroup discussions, Wan and McKeown [ 2004 ]
focus on summarizing another particular type of email discussion, where the conversation represents
a decision-making process. The system works by identifying an issue sentence in the originating
Search WWH ::




Custom Search