Databases Reference
In-Depth Information
likely is that a topic shift occurred; (ii) significant changes in speakership, under the assumption that
changes in the amount of activity of the different speakers correlates with changes in topic.
Galley et al. [ 2003 ] also performed an evaluation of their supervised approach by training and
testing a decision tree classifier 3 on the ICSI Meeting Corpus (see Chapter 2 ). They found that
although cohesion-based features are more critical than conversational ones, the system performs
best when all the features are used.
As we mentioned before, the topical structure of a document can be either flat or hierarchical.
While in a flat structure the text is simply modeled as a sequence of topical segments with no further
decomposition, in a hierarchical topic model segments can be further divided into subtopics. The
idea of integrating unsupervised cohesion-based segmentation with a supervised approach has been
also applied to detect hierarchical topic models of meetings. By following Galley's et al. approach
of combining LCSeg with a set of conversational features, Hsueh et al. [ 2006 ] have explored how
to perform topic modeling of meetings at different levels of granularity. They start by noticing that
a meeting can be often divided into a set of major topics, which can be further divided into more
refined sub-topics. For instance, a research project meeting could include as major topics status-report
and how to proceed , and how to proceed could be further segmented into experiment design and data
collection .
In their experiments, again on the ICSI meeting corpus transcripts, they compare the perfor-
mance of different segmentation approaches on the two tasks of identifying macro-topic vs. sub-topic
boundaries. Their findings indicate that the two tasks are quite different in this respect. While for
predicting major-topic shifts a supervised combination of lexical and conversational features works
best, for sub-topic shifts an unsupervised lexical-cohesion based method performs as well as the
supervised one.
Topic modeling of conversations can also be framed as a probabilistic modeling problem by
extending the basic LDA unsupervised framework. All this work is technically quite sophisticated,
so we limit our treatment to the basic ideas and insights. Purver et al. [ 2006b ] present an extension
of LDA that explicitly models a topic shift between two utterances with an additional binary hidden
variable c u , one for each utterance, indicating whether there is a shift after that utterance ( c u =1)
or not ( c u = 0). Using LDA terminology, a topic shift corresponds to a change in the probability
distribution over topics.
Figure 3.5 shows the graphical model corresponding to this variation of LDA. In this model,
each utterance in the conversation plays the role of a document in a collection. So, what is D in Figure
3.3 , here becomes U . By design, the distribution over topics for each utterance is conditioned on c u .
Furthermore, since utterances are sequentially ordered, this model makes the Markov assumption
that the distribution over topics of an utterance depends on the distribution over topics of the
previous utterance (arrows pointing down connecting the plate for u 1 to the one for u , and the
one for u to the one for u +
1 ).
3 See Poole and Mackworth [ 2010 ] for an introduction to decision trees.
Search WWH ::




Custom Search