Databases Reference
In-Depth Information
that the generated summary sentence will always be a string that actually occurred in at least one of
the input sentences; in other words, it is an extractive summary.
In 2008, the Text Analysis Conference 4 ran a pilot task on summarizing opinions in blog
posts. However, the blog data was not conversational in nature but rather featured individual blog
posts on a set of topics. For that reason, we do not describe the task or the submitted systems in
detail here since our primary interest is conversations. It could be argued, however, that the set of
posts on a given topic are conversational in a very general sense, since they feature people blogging
on a common topic and possibly reading each other posts, albeit not replying to each other in an
interactive fashion; this is “conversational” in the same sense that the multitude of Twitter posts on
a given topic in Sharifi et al. [ 2010 ] is conversational.
Assumptions and Inputs for Chat and Blog Summarization Systems Summarization of online
conversations such as blogs and forum discussions has been a less researched area than meeting
and email summarization, for a variety of reasons. Firstly, online conversations have only become
widely popular in recent years. Secondly, a point related to the first, is that there is no large, publicly
available blog corpus annotated with extractive and abstractive summaries such as exist for meetings
(the AMI and ICSI corpora) and emails (the Enron and BC3 corpora). And thirdly, the inputs,
tasks and use cases are not clearly defined nor agreed upon.
In the work of Zhou and Hovy [ 2005 ], the data are very conversational and personal, with
participants responding directly to one another. This contrasts with the work of Sharifi et al. [ 2010 ],
where the conversation is very diffuse and spread out; people are discussing a common topic on a
massive scale, sometimes responding directly to one another but often not. Somewhere in between
is the work of Hu et al. [ 2007 ], where the goal is to summarize individual blog posts in the context
of the comment discussions. These are all large-scale, online conversations, but the datasets are
structured differently from one another and the summarization goals are different as well.
It would be advantageous for the research community to define clear blog summarization
tasks, and to facilitate the creation of an annotated blog summarization corpus related to the tasks
of interest. A well-defined task could involve summarizing blog posts themselves, blog comments,
blog links, or some combination thereof. Blogs are an interesting case because there are typically
several types of conversations happening. In a group blog, the bloggers may be posting in response
to one another. In their individual posts, commenters may be carrying on discussions. Both bloggers
and commenters may also be linking to outside sources, forming a wider conversation.
Measuring Informativeness for Chat and Blog Summarization Systems Informativeness in
both Zhou and Hovy [ 2005 ] and Hu et al. [ 2007 ] is measured by considering not just the initi-
ating post, but responses to the post as well. In the case of Zhou and Hovy [ 2005 ] this is done by
identifying adjacency pairs in chat conversations, while in Hu et al. [ 2007 ] the researchers use blog
comments to consider the informativeness of the original post sentences. While the initial post may
4 http://www.nist.gov/tac/2008/
Search WWH ::




Custom Search