Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

Chat and Blog Summarization Case Studies In Zhou and Hovy [ 2005 ], the authors address the

task of automatically summarizing internet relay chats, using online discussions pertaining to the

GNU/Linux project 3 . These discussions actually consist of both chats and emails. An interesting

facet of the data is that the online community provided digests of its own discussions, including

quotes and hyperlinks. These served as naturally occurring gold standard summaries for training

and evaluation purposes.

The approach presented in Zhou and Hovy [ 2005 ] is to first segment and cluster the message

data, and then identify adjacency pairs in the text (see Chapter 3 for more details on these tasks). An

adjacency

pairs

example of an adjacency pair is a question-answer pair, where a person raised a question and another

person subsequently answered that question. A mini-summary is generated for each topic, where a

topic is represented by a cluster of messages. Each mini-summary consists of an initializing segment

of an adjacency pair followed by one or more responding segments. A supervised approach is taken,

comparing maximum entropy and SVM models [ Poole and Mackworth , 2010 ] with simple lexical

and structural features. The SVM classifier was found to outperform the maximum entropy model.

Hu et al. [ 2007 ] is one of the first examples of blog summarization to consider the blog com-

ments as an informative piece of data. While the extractive summaries they generate are summaries

blog

comments

of the original blog posts only, the authors weight blog post sentences highly if their constituent

words appear often in widely quoted comments and are often used by authoritative readers. They

also conducted a user study and found the interesting result that human summary annotators will

change their sentence selection decisions for a blog post when they are allowed to read blog com-

ments in addition to reading the blog post itself. This tells us that blog commenters play a large role

in highlighting, and even determining, what is most salient in a blog post.

Whereas Hu et al. consider longer blog posts and comments, in the work of Sharifi et al.

[ 2010 ], the task is automatically summarizing microblogs such as Twitter messages. Given a trending

(i.e., currently popular) topic and a set of messages, or tweets , concerning that topic, their system

produces a very brief, one-sentence summary of the topic. This data is not conversational in the

sense of meetings, emails or chats, where participants are directly engaging and responding to one

another, but it is conversational in that many thousands of users are simultaneously tweeting about

the same topic, forming a massive community conversation. The purpose of the summary is to

concisely convey why the topic is trending. For example, the topic Ted Kennedy might be trending

because Ted Kennedy died, and so the system will generate a summary such as A tragedy: Ted Kennedy

died today . The algorithm takes as input a topic phrase (e.g., Ted Kennedy ) and a set of sentences

from relevant tweets, and builds a graph representing word sequences that occur before and after

the topic phrase. Individual word nodes are weighted according to their occurrence count in that

position and their distance from the root node. Generating a summary consists of finding the path

with the largest weight from the root topic phrase to a non-root node. The root node is reinitialized

with this partial path and the rest of the sentence is generated by again finding the path with the

largest weight from this new root node to a non-root node. One effect of this implementation is

3 http:/ /www.gnu.org

Methods for Mining and Summarizing Text Conversations

Search WWH ::

Custom Search

Home