Databases Reference
In-Depth Information
3.4.5 EXTRACTING THE CONVERSATIONAL STRUCTURE
At first glance, it may seem that the task of extracting the structure of a conversation should be a rather
easy one. While for asynchronous conversations, like emails and blogs, the conversational structure
should be fully revealed by the reply-to relation between messages; for synchronous conversations,
such as meetings and chats, the structure should simply consists of the linear sequence of turns
appearing one after the other as the conversation evolves.
However, if you think more carefully, there are two limitations with this initial, simple view
of conversational structure. First, in asynchronous conversations the use of quotation can express
a conversational structure that is at a finer level of granularity than the one revealed by reply-to
relations between emails or blog posts. For instance, as we saw in the email example in Figure
3.10 , the proximity between a quoted paragraph and an unquoted one can represent an informative
conversational link between the two (i.e., question/answer adjacency pair) that would not appear by
only looking at the reply-to relations. Secondly, the linear structure of synchronous conversations
can be misleading in its simplicity. An empirical analysis of such conversations, of both meetings
and chats, show that what appears to be single, linear conversation, may in fact contain several
simultaneous conversations that need to be disentangled.
disentan-
gled
Let us now examine how we can deal with these two additional complexities for mining text
conversations. More specifically, how can we extract the finer granularity conversational structure
induced by the use of quotation in asynchronous conversations? And, how can we disentangled
simultaneous conversations in seemingly single, linear synchronous conversations?
Building the Fragment Quotation Graph Since in asynchronous conversations consecutive turns
can be far apart in time, when people reply to an email or comment on a blog post, a quotation
of the original message is often included by default in the draft reply in order to preserve context.
Furthermore, people tend to break down the quoted message so that different questions, requests or
claims can be dealt with separately. If, for instance, the original message is asking multiple questions,
the replier might type each answer under the corresponding question. As a result, each message,
unless it is at the beginning of a thread, will contain a mix of quoted and novel paragraphs that
may well reflect a reply-to relationship between paragraphs (or sentences) that is at a finer level of
granularity than the one explicitly recorded between emails.
Carenini et al. [ 2007 ] propose a novel approach to capture this finer level conversational
structure of asynchronous text conversations in the form of a Fragment Quotation Graph (FQG).
We describe the construction of a sample FQG by following an example originally presented
Fragment
Quotation
Graph
in Carenini et al. [ 2007 ]
Figure 3.14 (a) shows a real example of a conversation from the Enron Corpus involving six
emails. For the sake of illustration, we do not show the original text, but abbreviate it as a sequence
of labels < a, b, c, ..., j > , each one corresponding to a text fragment, typically a sentence or a
paragraph. To build a FQG, you follow a two-step process.
Search WWH ::




Custom Search