Information Technology Reference
In-Depth Information
Ta b l e 1 . List of predefined main categories adequate for our dialog scenario
Main Category
Science Economics
Family Education
Studies Literature
Mass media Music
Arts Health
Ecology Digital media
Sports Occupations
Fashion Food and drink
Leisure Transport
Intimate relationships Regions
The next step is the preprocessing of the corpus in that incomplete sentences and ex-
pressions are completed to adapt the recorded utterances to the conditions given by the
fact that human-sided utterances are based on keyboard inputs. Then, we will accom-
plish the evaluation by automatically identifying the dialog topics and topic shifts within
the CUBE-G interactions by means of our proposed method to subsequently compare
the results with the manual annotations included in the corpus. If showing promising
performance, a user study evaluating the application of emulated human topic aware-
ness in the agent Max' conversational behavior will be scheduled next.
Related Work
A lot of work has been carried out on offline topic identification. A prevalent model was
developed in the context of the Topic Detection and Tracking (TDT) research program
[20]. Within the TDT research, Allan determined five tasks (i.e., Story Segmentation,
First Story Detection, Cluster Detection, Tracking, and Story Link Detection) for de-
tecting the several topics outlined in a text-based newscast. Further offline approaches
compute the coherence between documents via similarity measures (e.g., [21,22]). Oth-
ers rank Wikipedia articles according to their relevance to a given text fragment, for
example via text classification algorithms [13] or by simply exploiting the Wikipedia
article titles and categories [23]. One recent approach uses the Wikipedia category net-
work as a conceptual taxonomy and derives a directed acyclic graph for each document
by mapping terms to a concept in the category network [24].
Approaches for the online identification of topics in natural language dialogs are
rare. One work realizing a “Dynamic Topic Tracking” of natural language conversa-
tions between a human and a robot roughly adapted the five tasks from the TDT project
(see above) to make the robot more situation aware in human-robot interaction [25].
Thereby the amount of topics and the according topic names are created dynamically by
gathering the topic names from content words most occurring in the dialog utterances.
On the contrary, existing taxonomies can serve as a source for topic labels, for exam-
ple derived from the online encyclopedia Wikipedia [8,16]. Furthermore, conversation
clusters visually highlight topics discussed in conversations using Explicit Semantic
Analysis based on Wikipedia articles [26].
Search WWH ::

Custom Search