Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

contain the bulk of the discussion, in each case it is the ensuing conversation that illuminates what

is important.

In Sharifi et al. [ 2010 ], the conversation is of a massive scale, with hundreds or thousands

of participants talking about a particular person, event or thing. Informativeness is determined by

analyzing the many lexical patterns used to refer to that topic. The final summary is a crowd-sourced

digest of the trending item.

A source of information available for online conversations is the presence of links within the

conversation. Sentences could be weighted not only by their lexical and structural features within

the given conversation, but also by whether they link to other documents and by the content of those

other documents.

Outputs and Interfaces for Chat and Blog Summarization Systems If the input conversation has a

threaded structure, an extractive summary of the conversation will likely need to be threaded as well

to maintain coherence. An abstractive system might generate separate paragraphs for each thread,

or try to aggregate similar threads.

Once an online conversation exceeds a certain size in terms of participants and number of

comments, extraction alone is probably not feasible to accurately characterize the discussion. One

strategy is to aggregate similar comments and generate new text to describe them, while presenting

a few of the original sentences as examples. This would constitute a hybrid extractive-abstractive

system.

For massively large online conversations, visualizations can also be a good complement to the

text. For summarizing thousands, or even millions, of tweets, a mix of information visualizations and

word clouds, such as Figure 3.7 can be very effective. One could also visualize clusters of conversation

participants to easily see which individuals are interacting the most.

4.4

SUMMARIZING MULTI-DOMAIN CONVERSATIONS

The summarization systems discussed up until this point have primarily been designed with particular

domains in mind and attempt to harness unique features of those domains. For example, meeting

summarization systems often use prosodic features while email summarizers derive metadata from

the email headers. In contrast, other summarization systems have been designed to work across a

variety of conversation domains and modalities. Here, we briefly discuss several such multi-domain

systems.

In early work on conversation summarization, Zechner [ 2002 ] investigates summarizing sev-

eral genres of speech, including spontaneous meeting speech. While this work focuses on spoken

modalities, the system is not speech-specific and could be applied to written conversations as well.

Though relevance detection in his work relies largely on tf.idf scores, Zechner also explored cross-

speaker information linking and question-answer detection, so that utterances can be extracted not

only according to high tf.idf scores, but also if they were linked to other informative utterances. This

Search WWH ::

Custom Search

Home