Introduction - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

1.4.1 MINING TEXT CONVERSATIONS

A small set of basic challenging questions can be asked about any text conversations: what topics are

covered in the conversation? what opinions do participants express on those topics? what is structure

of the conversation? or more specifically, what is the intended function of each particular message

(or sentence) and its relationship to other contributions?

We can consider these questions in order.

TopicModeling: Topic Segmentation andTopic Labeling Conversations often span different topics;

an initial email message, asking a team to explain low sales in Asia, can generate a thread on what

is the best visualization tool for a particular analysis task. Or, alternatively, the follow up may be a

discussion on how the team may need to be reorganized.

Even if you look at our short, sample email conversation, it clearly covers at least two topics.

The conversation starts with a proposal for a vacation but then one sub-thread (on the right of

Figure 1.4 ) veers off talking about a problematic course assignment.

This example can help us to define the two basic subtasks of topic modeling: topic segmentation

and topic labeling. In topic segmentation, you are interested in identifying what portions of the

conversation are about the same topic, or equivalently, in detecting where in the conversation the

topic shifts are. For instance, in our sample conversation, there is a topic shift between the first

and the second (non quoted) sentences in Email-1.2 and this shift splits the conversation in two

segments, i.e., the text below the shift in the right sub-thread vs. the rest of the conversation.

Topic labeling, on the other hand, is about generating informative labels (typically sets of

words) for all the topics covered by a conversation. In our example, two informative (but still not

ideal) labels for the two identified topics might be “spring break Mexico skiing” and “assignment

question idea”.

A large number of topic modeling techniques have been developed for generic text (not neces-

sarily conversational in nature), including supervised and unsupervised machine learning methods,

as well as a combination of the two. Among all these proposals, a novel, probabilistic approach,

based on Latent Dirichlet Allocation (LDA) [ Blei et al. , 2003 ] appears to be the most effective and

influential (see Blei and Lafferty [ 2009 ] for a gentle introduction). In LDA, the generation of a

collection of documents is modeled as a stochastic process, and topic modeling consist of estimating

the parameters of the underlying probabilistic generative model.

In Chapter 3 , we will discuss how topic modeling techniques developed for generic text can

be extended to deal with text conversations. For instance, how variations of the LDA framework

have been successfully applied to meeting transcripts [ Purver et al. , 2006b ], as well as to Twit-

ter [ Ramage et al. , 2010 ] and email conversations [ Dredze et al. , 2008 ].

Sentiment and Subjectivity (i.e., Opinion Mining) Conversations typically exhibit a large amount

of highly subjective content. Participants may agree or disagree with one another, argue for or against

various proposals, and generally take turns expressing their opinions and emotions. Mining all this

subjective content can be framed at different levels of granularity. At the highest level, you have

Search WWH ::

Custom Search

Home