Databases Reference
In-Depth Information
F-score for the different dialogue acts ranging from . 44- . 85. As for features used by the classifiers,
the best performance was achieved with a rich set of features, which included features based on the
identification of time and date expressions, part of speech and bigrams. For a clear illustration of
why bigrams would help in the task, consider the bigrams “I will” and “will you” . While these two
bigrams would strongly indicate a commitment and a request dialogue act, respectively, the three
constituent words, “I”, “you”, “will” , in isolation, would be much less informative.
A key limitation of Cohen et al's proposal is that it does not exploit the tendency of dialogue
acts to occur in adjacency pairs. It blindly classifies one email message at the time, without considering
dependency between a message and its neighbor messages in the email thread. The same research
group addressed this limitation the following year in Carvalho and Cohen [ 2005 ], where they present
an iterative collective classification algorithm 8 in which two classifiers are trained for each dialogue
act d i . One classifier, Content d i , only looks at the content of the message (it is the same classifier
presented in Cohen et al. [ 2004 ]), whereas the other classifier, Context d i , takes into account both
the content of the message and the context in which the message occurs, i.e., the dialogue act labels
of its parent and children. The algorithm works as follows.
1. Initialize the labels of each message by applying the Content classifiers (which do not need
labels for the other messages).
2. Repeat for a given number of iterations (60 in the proposal).
￿ Revise the labels of all the messages by applying to each message all the Context classi-
fiers.
Figure 3.9 illustrates the algorithm's key operations.
Experimental results show that taking the context into account does improve performance.
However, improvements are modest and only for some of the dialogue acts, which indicates that
exclusively supervised approaches to email dialogue act labeling may not be the ideal solution.
Similar results are obtained by Shrestha and McKeown [ 2004 ], who propose a supervised
approach for a rather different dialogue act labeling task. Instead of labeling each message in an
email thread with a subset of the labels in a tagset, they only determine whether any two sentences
in the thread form a question-answer adjacency pair. On the one hand, this is a more complex task,
because it operates at a finer level of granularity (single sentences vs. whole messages), but on the
other hand, it is a simpler task because it is limited to identifying only two dialogue acts.
In their work, the detection of question-answer pairs is broken down into two steps. First,
you need to identify all the questions in the thread. Next, for each question you need to detect the
corresponding answers. Let us examine these two steps in order.
On the surface, it may appear that determining whether a sentence is a question or not in
written conversations, like email, should be straightforward, because of the use of the question mark.
However, Shrestha and McKeown [ 2004 ] discuss three reasons why relying on question marks is
not sufficient.
8 This algorithm is an implementation of a Dependency Network [ Heckerman et al. , 2001 ].
Search WWH ::




Custom Search