Mining Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

email posing the two questions Q 1 and Q 2 . However, this is not the case for the answer A 22 , which

appears in Email-5, after two emails, Email-3 and Email-4.

Email4

Email3

>>Question Q 1

Answer A 12

>Answer A 11

>>Question Q 2

>Question Q 1

Answer A 11

>Question Q 2

Email5

Email1

>>>Question Q 1

>Answer A 12

>>Answer A 11

>>>Question Q 2

Answer A 22

Question Q 1

Question Q 2

Email2

>Question Q 1

>Question Q 2

Answer A 21

Figure 3.10: Sample email thread that starts with Email-1 which contains two questions, Q 1 and Q 2 .

These questions receive multiple answers in the following four emails. An answer labeled A i,j

means

answer j to question i .

For the answer detection task, Shrestha and McKeown also propose a supervised classification

approach, where a binary classifier, given a question q , can determine for any utterance u i following

q in the thread, whether or not u i is a response to q . Even by using a large and complex set of

features, based on the lexical similarity between q and u i as well as the position of q and u i in the

thread, the performance of this approach is modest (F-scores are in the 0.5-0.7 range for different

training data).

One critical limitation of this work is that it does not consider quotation as a source of

information. As we will see in Section 3.4.5 , quotation can be effectively exploited to create a finer-

level representation of the conversational structure, which, we will argue, can simplify several mining

task, including the dialogue act labeling one. For instance, looking again at Figure 3.10 , the answer

A 22 is far from Email-1 (which posed the corresponding question Q 2 ), but it is adjacent to the

quotation of Q 2 (in Email-5).

Although the supervised methods we have discussed so far have generated very useful insights

on the task of dialogue act labeling of text conversations, they do require large amounts of annotated

data for training, which is not only difficult and extremely time-consuming to build, but also needs to

be created for any new conversational modality. By comparison, semi-supervised methods represent

a valid alternative, since they can be easily applied to a new domain, as long as you have a considerable

amount of unlabeled data in that domain.

Methods for Mining and Summarizing Text Conversations

Search WWH ::

Custom Search

Home