Databases Reference
In-Depth Information
Chat and Blog Summarization Case Studies In Zhou and Hovy [ 2005 ], the authors address the
task of automatically summarizing internet relay chats, using online discussions pertaining to the
GNU/Linux project 3 . These discussions actually consist of both chats and emails. An interesting
facet of the data is that the online community provided digests of its own discussions, including
quotes and hyperlinks. These served as naturally occurring gold standard summaries for training
and evaluation purposes.
The approach presented in Zhou and Hovy [ 2005 ] is to first segment and cluster the message
data, and then identify adjacency pairs in the text (see Chapter 3 for more details on these tasks). An
adjacency
pairs
example of an adjacency pair is a question-answer pair, where a person raised a question and another
person subsequently answered that question. A mini-summary is generated for each topic, where a
topic is represented by a cluster of messages. Each mini-summary consists of an initializing segment
of an adjacency pair followed by one or more responding segments. A supervised approach is taken,
comparing maximum entropy and SVM models [ Poole and Mackworth , 2010 ] with simple lexical
and structural features. The SVM classifier was found to outperform the maximum entropy model.
Hu et al. [ 2007 ] is one of the first examples of blog summarization to consider the blog com-
ments as an informative piece of data. While the extractive summaries they generate are summaries
blog
comments
of the original blog posts only, the authors weight blog post sentences highly if their constituent
words appear often in widely quoted comments and are often used by authoritative readers. They
also conducted a user study and found the interesting result that human summary annotators will
change their sentence selection decisions for a blog post when they are allowed to read blog com-
ments in addition to reading the blog post itself. This tells us that blog commenters play a large role
in highlighting, and even determining, what is most salient in a blog post.
Whereas Hu et al. consider longer blog posts and comments, in the work of Sharifi et al.
[ 2010 ], the task is automatically summarizing microblogs such as Twitter messages. Given a trending
(i.e., currently popular) topic and a set of messages, or tweets , concerning that topic, their system
produces a very brief, one-sentence summary of the topic. This data is not conversational in the
sense of meetings, emails or chats, where participants are directly engaging and responding to one
another, but it is conversational in that many thousands of users are simultaneously tweeting about
the same topic, forming a massive community conversation. The purpose of the summary is to
concisely convey why the topic is trending. For example, the topic Ted Kennedy might be trending
because Ted Kennedy died, and so the system will generate a summary such as A tragedy: Ted Kennedy
died today . The algorithm takes as input a topic phrase (e.g., Ted Kennedy ) and a set of sentences
from relevant tweets, and builds a graph representing word sequences that occur before and after
the topic phrase. Individual word nodes are weighted according to their occurrence count in that
position and their distance from the root node. Generating a summary consists of finding the path
with the largest weight from the root topic phrase to a non-root node. The root node is reinitialized
with this partial path and the rest of the sentence is generated by again finding the path with the
largest weight from this new root node to a non-root node. One effect of this implementation is
3 http:/ /www.gnu.org
Search WWH ::




Custom Search