Databases Reference
In-Depth Information
Systems such as that of Kleinbauer et al. [ 2007 ] assume that the meetings follow a particular
scenario, with the participants having distinct roles and the group working together towards a specific
meeting
scenarios
goal. Each meeting in the scenario represents a particular design stage. The summarizer can create
rich and detailed abstracts for meetings that follow such a scenario. However, applying the system
to other types of meetings and conversations would require significant effort in terms of ontology
design and retraining, etc.
Measuring Informativeness inMeeting Summarization Systems As evidenced by the case studies,
meeting summarization systems have typically taken one of two general approaches: feeding an
ASR transcript to a text summarization algorithm such as MMR, or using more speech-specific
approaches that may incorporate prosody and dialogue features. Penn and Zhu [ 2008 ] question the
true impact of “avant-garde” features such as speech prosody, showing that much of the improvement
avant-
garde
features
those features brought could be captured by much simpler features capturing the length or duration of
each utterance. Similarly, Murray [ 2007 ] separates length and duration features from “true” prosodic
features and finds that length features are indeed a challenging baseline. However, it is also found
that one can achieve respectable extractive summarization results, with AUROC scores as high as
0.74, using only true prosodic features such as energy and pitch and no use of lexical or structural
features.
In our later discussion on summarizing conversations across modalities in Section 4.4 , we will
again see that—similar to the findings of Zhu and Penn—a competitive system need not incorporate
domain-specific features such as prosody. But in situations where a transcript might not be available, it
is interesting that prosody alone can be useful for indicating informativeness, and one could generate
an audio summary using only features from the speech signal.
Beyond prosody and dialogue features, there has been little work on investigating the use
of other “avant-garde” features available from the multi-modal datastream, such as notes, slides,
and whiteboard events. It remains to be seen how big of an impact these features might have on
summarization performance.
Outputs and Interfaces for Meeting Summarization Systems With meeting summarization, there
is a great number of possible outputs and interfaces. While informativeness might be determined as
discussed in the previous section, using perhaps a variety of text and speech features, the summary
output could be completely non-textual in order to minimize the exposure of end-users to noisy
ASR data. For instance, the summary could be a concatenation of the relevant audio clips, or a video
summary .
video
summary
Otherwise, with meeting summarization, extractive systems are at a potential disadvantage
compared with abstractive systems, as the summary units will be disfluent utterances taken from the
noisy, error-filled ASR transcript. Even if the sentence classification is good, readers may find it very
tedious or difficult to read the extractive summary. A simple way to improve a meeting extract is to
remove filled pauses and try to repair some disfluencies.
disfluency
removal
Search WWH ::




Custom Search