Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

Systems such as that of Kleinbauer et al. [ 2007 ] assume that the meetings follow a particular

scenario, with the participants having distinct roles and the group working together towards a specific

meeting

scenarios

goal. Each meeting in the scenario represents a particular design stage. The summarizer can create

rich and detailed abstracts for meetings that follow such a scenario. However, applying the system

to other types of meetings and conversations would require significant effort in terms of ontology

design and retraining, etc.

Measuring Informativeness inMeeting Summarization Systems As evidenced by the case studies,

meeting summarization systems have typically taken one of two general approaches: feeding an

ASR transcript to a text summarization algorithm such as MMR, or using more speech-specific

approaches that may incorporate prosody and dialogue features. Penn and Zhu [ 2008 ] question the

true impact of “avant-garde” features such as speech prosody, showing that much of the improvement

avant-

garde

features

those features brought could be captured by much simpler features capturing the length or duration of

each utterance. Similarly, Murray [ 2007 ] separates length and duration features from “true” prosodic

features and finds that length features are indeed a challenging baseline. However, it is also found

that one can achieve respectable extractive summarization results, with AUROC scores as high as

0.74, using only true prosodic features such as energy and pitch and no use of lexical or structural

features.

In our later discussion on summarizing conversations across modalities in Section 4.4 , we will

again see that—similar to the findings of Zhu and Penn—a competitive system need not incorporate

domain-specific features such as prosody. But in situations where a transcript might not be available, it

is interesting that prosody alone can be useful for indicating informativeness, and one could generate

an audio summary using only features from the speech signal.

Beyond prosody and dialogue features, there has been little work on investigating the use

of other “avant-garde” features available from the multi-modal datastream, such as notes, slides,

and whiteboard events. It remains to be seen how big of an impact these features might have on

summarization performance.

Outputs and Interfaces for Meeting Summarization Systems With meeting summarization, there

is a great number of possible outputs and interfaces. While informativeness might be determined as

discussed in the previous section, using perhaps a variety of text and speech features, the summary

output could be completely non-textual in order to minimize the exposure of end-users to noisy

ASR data. For instance, the summary could be a concatenation of the relevant audio clips, or a video

summary .

video

summary

Otherwise, with meeting summarization, extractive systems are at a potential disadvantage

compared with abstractive systems, as the summary units will be disfluent utterances taken from the

noisy, error-filled ASR transcript. Even if the sentence classification is good, readers may find it very

tedious or difficult to read the extractive summary. A simple way to improve a meeting extract is to

remove filled pauses and try to repair some disfluencies.

disfluency

removal

Methods for Mining and Summarizing Text Conversations

Search WWH ::

Custom Search

Home