Databases Reference
In-Depth Information
We will start our discussion from meetings and then move to other conversational modalities.
For each proposal we will point out whether the proposed technique is unsupervised, supervised,
or a combination of the two. Unsupervised methods will be characterized with respect to whether
they follow a cohesion-based, a probabilistic LDA or other approaches. In contrast, for supervised
methods we will give special attention to what features are used by the classifier.
Topic Modeling for Meeting Transcripts The first comprehensive approach to topic segmentation
in meetings was presented by Galley et al. [ 2003 ]. Their method combines ideas from unsupervised
cohesion-based segmentation with a supervised approach. As in TextTiling, their unsupervised
cohesion-based technique, called LCSeg, first computes a cohesion score for each potential segment
LCSeg
boundary (every gap between two utterances) and identify as boundaries all the gaps where that
function drops significantly and reaches a minimum. One improvement with respect to TextTiling is
that their cohesion function is not simply based on shallow word vector similarity, but it also considers
the more sophisticated notion of lexical chains (i.e., sequences of related words spanning multiple
sentences [ Morris and Hirst , 1991 ]). Another advantage of LCSeg is that instead of returning a
lexical
chains
yes/no boundary decision for each potential boundary, it can return a probability estimate, which
can later be used effectively in the supervised method.
Their supervised method follows the standard approach we described for generic text, in which
topic segmentation is framed as a binary classification task. However, Galley et al., in addition to
cohesion-based and discourse marker features, also considered conversational features. Of these, the
ones that are not meeting/speech specific and can be applied to other text conversations, include:
(i) gaps/pauses/silences between utterances, under the assumption that the longer the gap the more
Supervised
Unsupervised
Cohesion Based
(TextTiling)
To p i c
Segmentation
Binary
Classification
Probabilistic
Modeling
(LDA)
To p i c
Modeling
Sequence
Labeling
To p i c
Labeling
Multiclass
Classification
Probabilistic
Modeling
(LDA)
Figure 3.4: Topic modeling approaches for generic text.
 
Search WWH ::




Custom Search