Databases Reference
In-Depth Information
email and extracting reply sentences to that issue from subsequent emails, for each participant.
The issue sentence is determined by comparing each candidate sentence vector in the originating
email to a comparison vector representing all replies, with the issue sentence being most similar to
the comparison vector. The authors consider several ways of constructing the comparison vector,
such as a standard centroid (a feature vector representing the document, with significant terms
represented using term-weights such as tf.idf or normalized frequency), a centroid with singular value
decomposition applied (mapping the sentences to a lower dimensionality, revealing core “concepts”)
and a combined voting approach. Response sentences are simply selected by taking the first sentence
of the replies from each participant. An example summary from Wan & McKeown is shown below:
Issue : Let me know if you agree or disagree w/choice of plaque and (especially)
wording.
Response 1 : I like the plaque, and aside for exchanging Dana's name for “Sally
Slater” and ACM for “Ladies Auxiliary,” the wording is nice.
Response 2 : I prefer Christy's wording to the plaque original.
Nenkova and Bagga [ 2003 ] aim to create indicative summaries of an email thread, providing
enough information about a thread to allow a user to decide whether or not to retrieve the entire
thread for browsing. Each summary begins with the subject line of the root email. A sentence from
the root email is then selected based on the overlap of its content terms with the subject line. The
remaining summary sentences are chosen by selecting, for each reply email, the sentence that has
the highest overlap of content terms with the entirety of the root email. The aim is that the resultant
summary will describe the subject of the thread, a statement of a problem or information request,
and a brief digest of the immediate responses to that statement. The authors found that this worked
well on their particular dataset, the Pine-Info mailing list 1 , because threads typically begin with a
user asking for help on a particular problem and receiving numerous suggestions.
Carenini et al. [ 2007 ] created an unsupervised email thread summarization system based on
clue words . Their approach relies on the conversation structure of the emails and the repeated words
clue words
throughout the thread. The email conversation is represented as a graph structure with email frag-
ments as nodes (i.e., the Fragment Quotation Graph introduced in Section 3.4.5 ). Clue words are
the highly informative words that occur in adjacent nodes of the graph. This system exemplifies the
general idea of representing conversations as graphs, where nodes can represent sentences, fragments
or conversation participants.
Assumptions and Inputs for Email Summarization Systems As noted previously, email summariza-
tion systems differ in whether the input document to the summarizer is a single email or an email
thread. Here, we consider systems that summarize partial or entire threads. Systems also differ in
whether they expect a user-supplied query or solely the document to be summarized.
Many of the systems we mentioned are designed based on assumptions about the nature and
purpose of the email conversations. For example, the system of Wan and McKeown [ 2004 ] assumes
1 http://www.washington.edu/pine/pine-info/
 
Search WWH ::




Custom Search