Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

email and extracting reply sentences to that issue from subsequent emails, for each participant.

The issue sentence is determined by comparing each candidate sentence vector in the originating

email to a comparison vector representing all replies, with the issue sentence being most similar to

the comparison vector. The authors consider several ways of constructing the comparison vector,

such as a standard centroid (a feature vector representing the document, with significant terms

represented using term-weights such as tf.idf or normalized frequency), a centroid with singular value

decomposition applied (mapping the sentences to a lower dimensionality, revealing core “concepts”)

and a combined voting approach. Response sentences are simply selected by taking the first sentence

of the replies from each participant. An example summary from Wan & McKeown is shown below:

Issue : Let me know if you agree or disagree w/choice of plaque and (especially)

wording.

Response 1 : I like the plaque, and aside for exchanging Dana's name for “Sally

Slater” and ACM for “Ladies Auxiliary,” the wording is nice.

Response 2 : I prefer Christy's wording to the plaque original.

Nenkova and Bagga [ 2003 ] aim to create indicative summaries of an email thread, providing

enough information about a thread to allow a user to decide whether or not to retrieve the entire

thread for browsing. Each summary begins with the subject line of the root email. A sentence from

the root email is then selected based on the overlap of its content terms with the subject line. The

remaining summary sentences are chosen by selecting, for each reply email, the sentence that has

the highest overlap of content terms with the entirety of the root email. The aim is that the resultant

summary will describe the subject of the thread, a statement of a problem or information request,

and a brief digest of the immediate responses to that statement. The authors found that this worked

well on their particular dataset, the Pine-Info mailing list 1 , because threads typically begin with a

user asking for help on a particular problem and receiving numerous suggestions.

Carenini et al. [ 2007 ] created an unsupervised email thread summarization system based on

clue words . Their approach relies on the conversation structure of the emails and the repeated words

clue words

throughout the thread. The email conversation is represented as a graph structure with email frag-

ments as nodes (i.e., the Fragment Quotation Graph introduced in Section 3.4.5 ). Clue words are

the highly informative words that occur in adjacent nodes of the graph. This system exemplifies the

general idea of representing conversations as graphs, where nodes can represent sentences, fragments

or conversation participants.

Assumptions and Inputs for Email Summarization Systems As noted previously, email summariza-

tion systems differ in whether the input document to the summarizer is a single email or an email

thread. Here, we consider systems that summarize partial or entire threads. Systems also differ in

whether they expect a user-supplied query or solely the document to be summarized.

Many of the systems we mentioned are designed based on assumptions about the nature and

purpose of the email conversations. For example, the system of Wan and McKeown [ 2004 ] assumes

1 http://www.washington.edu/pine/pine-info/

Search WWH ::

Custom Search

Home