Mining Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

3.4.5 EXTRACTING THE CONVERSATIONAL STRUCTURE

At first glance, it may seem that the task of extracting the structure of a conversation should be a rather

easy one. While for asynchronous conversations, like emails and blogs, the conversational structure

should be fully revealed by the reply-to relation between messages; for synchronous conversations,

such as meetings and chats, the structure should simply consists of the linear sequence of turns

appearing one after the other as the conversation evolves.

However, if you think more carefully, there are two limitations with this initial, simple view

of conversational structure. First, in asynchronous conversations the use of quotation can express

a conversational structure that is at a finer level of granularity than the one revealed by reply-to

relations between emails or blog posts. For instance, as we saw in the email example in Figure

3.10 , the proximity between a quoted paragraph and an unquoted one can represent an informative

conversational link between the two (i.e., question/answer adjacency pair) that would not appear by

only looking at the reply-to relations. Secondly, the linear structure of synchronous conversations

can be misleading in its simplicity. An empirical analysis of such conversations, of both meetings

and chats, show that what appears to be single, linear conversation, may in fact contain several

simultaneous conversations that need to be disentangled.

disentan-

gled

Let us now examine how we can deal with these two additional complexities for mining text

conversations. More specifically, how can we extract the finer granularity conversational structure

induced by the use of quotation in asynchronous conversations? And, how can we disentangled

simultaneous conversations in seemingly single, linear synchronous conversations?

Building the Fragment Quotation Graph Since in asynchronous conversations consecutive turns

can be far apart in time, when people reply to an email or comment on a blog post, a quotation

of the original message is often included by default in the draft reply in order to preserve context.

Furthermore, people tend to break down the quoted message so that different questions, requests or

claims can be dealt with separately. If, for instance, the original message is asking multiple questions,

the replier might type each answer under the corresponding question. As a result, each message,

unless it is at the beginning of a thread, will contain a mix of quoted and novel paragraphs that

may well reflect a reply-to relationship between paragraphs (or sentences) that is at a finer level of

granularity than the one explicitly recorded between emails.

Carenini et al. [ 2007 ] propose a novel approach to capture this finer level conversational

structure of asynchronous text conversations in the form of a Fragment Quotation Graph (FQG).

We describe the construction of a sample FQG by following an example originally presented

Fragment

Quotation

Graph

in Carenini et al. [ 2007 ]

Figure 3.14 (a) shows a real example of a conversation from the Enron Corpus involving six

emails. For the sake of illustration, we do not show the original text, but abbreviate it as a sequence

of labels < a, b, c, ..., j > , each one corresponding to a text fragment, typically a sentence or a

paragraph. To build a FQG, you follow a two-step process.

Search WWH ::

Custom Search

Home