Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

that the generated summary sentence will always be a string that actually occurred in at least one of

the input sentences; in other words, it is an extractive summary.

In 2008, the Text Analysis Conference 4 ran a pilot task on summarizing opinions in blog

posts. However, the blog data was not conversational in nature but rather featured individual blog

posts on a set of topics. For that reason, we do not describe the task or the submitted systems in

detail here since our primary interest is conversations. It could be argued, however, that the set of

posts on a given topic are conversational in a very general sense, since they feature people blogging

on a common topic and possibly reading each other posts, albeit not replying to each other in an

interactive fashion; this is “conversational” in the same sense that the multitude of Twitter posts on

a given topic in Sharifi et al. [ 2010 ] is conversational.

Assumptions and Inputs for Chat and Blog Summarization Systems Summarization of online

conversations such as blogs and forum discussions has been a less researched area than meeting

and email summarization, for a variety of reasons. Firstly, online conversations have only become

widely popular in recent years. Secondly, a point related to the first, is that there is no large, publicly

available blog corpus annotated with extractive and abstractive summaries such as exist for meetings

(the AMI and ICSI corpora) and emails (the Enron and BC3 corpora). And thirdly, the inputs,

tasks and use cases are not clearly defined nor agreed upon.

In the work of Zhou and Hovy [ 2005 ], the data are very conversational and personal, with

participants responding directly to one another. This contrasts with the work of Sharifi et al. [ 2010 ],

where the conversation is very diffuse and spread out; people are discussing a common topic on a

massive scale, sometimes responding directly to one another but often not. Somewhere in between

is the work of Hu et al. [ 2007 ], where the goal is to summarize individual blog posts in the context

of the comment discussions. These are all large-scale, online conversations, but the datasets are

structured differently from one another and the summarization goals are different as well.

It would be advantageous for the research community to define clear blog summarization

tasks, and to facilitate the creation of an annotated blog summarization corpus related to the tasks

of interest. A well-defined task could involve summarizing blog posts themselves, blog comments,

blog links, or some combination thereof. Blogs are an interesting case because there are typically

several types of conversations happening. In a group blog, the bloggers may be posting in response

to one another. In their individual posts, commenters may be carrying on discussions. Both bloggers

and commenters may also be linking to outside sources, forming a wider conversation.

Measuring Informativeness for Chat and Blog Summarization Systems Informativeness in

both Zhou and Hovy [ 2005 ] and Hu et al. [ 2007 ] is measured by considering not just the initi-

ating post, but responses to the post as well. In the case of Zhou and Hovy [ 2005 ] this is done by

identifying adjacency pairs in chat conversations, while in Hu et al. [ 2007 ] the researchers use blog

comments to consider the informativeness of the original post sentences. While the initial post may

4 http://www.nist.gov/tac/2008/

Search WWH ::

Custom Search

Home