Databases Reference
In-Depth Information
contain the bulk of the discussion, in each case it is the ensuing conversation that illuminates what
is important.
In Sharifi et al. [ 2010 ], the conversation is of a massive scale, with hundreds or thousands
of participants talking about a particular person, event or thing. Informativeness is determined by
analyzing the many lexical patterns used to refer to that topic. The final summary is a crowd-sourced
digest of the trending item.
A source of information available for online conversations is the presence of links within the
conversation. Sentences could be weighted not only by their lexical and structural features within
the given conversation, but also by whether they link to other documents and by the content of those
other documents.
Outputs and Interfaces for Chat and Blog Summarization Systems If the input conversation has a
threaded structure, an extractive summary of the conversation will likely need to be threaded as well
to maintain coherence. An abstractive system might generate separate paragraphs for each thread,
or try to aggregate similar threads.
Once an online conversation exceeds a certain size in terms of participants and number of
comments, extraction alone is probably not feasible to accurately characterize the discussion. One
strategy is to aggregate similar comments and generate new text to describe them, while presenting
a few of the original sentences as examples. This would constitute a hybrid extractive-abstractive
system.
For massively large online conversations, visualizations can also be a good complement to the
text. For summarizing thousands, or even millions, of tweets, a mix of information visualizations and
word clouds, such as Figure 3.7 can be very effective. One could also visualize clusters of conversation
participants to easily see which individuals are interacting the most.
4.4
SUMMARIZING MULTI-DOMAIN CONVERSATIONS
The summarization systems discussed up until this point have primarily been designed with particular
domains in mind and attempt to harness unique features of those domains. For example, meeting
summarization systems often use prosodic features while email summarizers derive metadata from
the email headers. In contrast, other summarization systems have been designed to work across a
variety of conversation domains and modalities. Here, we briefly discuss several such multi-domain
systems.
In early work on conversation summarization, Zechner [ 2002 ] investigates summarizing sev-
eral genres of speech, including spontaneous meeting speech. While this work focuses on spoken
modalities, the system is not speech-specific and could be applied to written conversations as well.
Though relevance detection in his work relies largely on tf.idf scores, Zechner also explored cross-
speaker information linking and question-answer detection, so that utterances can be extracted not
only according to high tf.idf scores, but also if they were linked to other informative utterances. This
Search WWH ::




Custom Search