Mining Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

We will start our discussion from meetings and then move to other conversational modalities.

For each proposal we will point out whether the proposed technique is unsupervised, supervised,

or a combination of the two. Unsupervised methods will be characterized with respect to whether

they follow a cohesion-based, a probabilistic LDA or other approaches. In contrast, for supervised

methods we will give special attention to what features are used by the classifier.

Topic Modeling for Meeting Transcripts The first comprehensive approach to topic segmentation

in meetings was presented by Galley et al. [ 2003 ]. Their method combines ideas from unsupervised

cohesion-based segmentation with a supervised approach. As in TextTiling, their unsupervised

cohesion-based technique, called LCSeg, first computes a cohesion score for each potential segment

LCSeg

boundary (every gap between two utterances) and identify as boundaries all the gaps where that

function drops significantly and reaches a minimum. One improvement with respect to TextTiling is

that their cohesion function is not simply based on shallow word vector similarity, but it also considers

the more sophisticated notion of lexical chains (i.e., sequences of related words spanning multiple

sentences [ Morris and Hirst , 1991 ]). Another advantage of LCSeg is that instead of returning a

lexical

chains

yes/no boundary decision for each potential boundary, it can return a probability estimate, which

can later be used effectively in the supervised method.

Their supervised method follows the standard approach we described for generic text, in which

topic segmentation is framed as a binary classification task. However, Galley et al., in addition to

cohesion-based and discourse marker features, also considered conversational features. Of these, the

ones that are not meeting/speech specific and can be applied to other text conversations, include:

(i) gaps/pauses/silences between utterances, under the assumption that the longer the gap the more

Supervised

Unsupervised

Cohesion Based

(TextTiling)

To p i c

Segmentation

Binary

Classification

Probabilistic

Modeling

(LDA)

To p i c

Modeling

Sequence

Labeling

To p i c

Labeling

Multiclass

Classification

Probabilistic

Modeling

(LDA)

Figure 3.4: Topic modeling approaches for generic text.

Methods for Mining and Summarizing Text Conversations

Search WWH ::

Custom Search

Home