Mining Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

likely is that a topic shift occurred; (ii) significant changes in speakership, under the assumption that

changes in the amount of activity of the different speakers correlates with changes in topic.

Galley et al. [ 2003 ] also performed an evaluation of their supervised approach by training and

testing a decision tree classifier 3 on the ICSI Meeting Corpus (see Chapter 2 ). They found that

although cohesion-based features are more critical than conversational ones, the system performs

best when all the features are used.

As we mentioned before, the topical structure of a document can be either flat or hierarchical.

While in a flat structure the text is simply modeled as a sequence of topical segments with no further

decomposition, in a hierarchical topic model segments can be further divided into subtopics. The

idea of integrating unsupervised cohesion-based segmentation with a supervised approach has been

also applied to detect hierarchical topic models of meetings. By following Galley's et al. approach

of combining LCSeg with a set of conversational features, Hsueh et al. [ 2006 ] have explored how

to perform topic modeling of meetings at different levels of granularity. They start by noticing that

a meeting can be often divided into a set of major topics, which can be further divided into more

refined sub-topics. For instance, a research project meeting could include as major topics status-report

and how to proceed , and how to proceed could be further segmented into experiment design and data

collection .

In their experiments, again on the ICSI meeting corpus transcripts, they compare the perfor-

mance of different segmentation approaches on the two tasks of identifying macro-topic vs. sub-topic

boundaries. Their findings indicate that the two tasks are quite different in this respect. While for

predicting major-topic shifts a supervised combination of lexical and conversational features works

best, for sub-topic shifts an unsupervised lexical-cohesion based method performs as well as the

supervised one.

Topic modeling of conversations can also be framed as a probabilistic modeling problem by

extending the basic LDA unsupervised framework. All this work is technically quite sophisticated,

so we limit our treatment to the basic ideas and insights. Purver et al. [ 2006b ] present an extension

of LDA that explicitly models a topic shift between two utterances with an additional binary hidden

variable c u , one for each utterance, indicating whether there is a shift after that utterance ( c u =1)

or not ( c u = 0). Using LDA terminology, a topic shift corresponds to a change in the probability

distribution over topics.

Figure 3.5 shows the graphical model corresponding to this variation of LDA. In this model,

each utterance in the conversation plays the role of a document in a collection. So, what is D in Figure

3.3 , here becomes U . By design, the distribution over topics for each utterance is conditioned on c u .

Furthermore, since utterances are sequentially ordered, this model makes the Markov assumption

that the distribution over topics of an utterance depends on the distribution over topics of the

previous utterance (arrows pointing down connecting the plate for u − 1 to the one for u , and the

one for u to the one for u +

1 ).

3 See Poole and Mackworth [ 2010 ] for an introduction to decision trees.

Search WWH ::

Custom Search

Home