Information Technology Reference
In-Depth Information
conversations, calls for a dynamic handling of previously unknown contributions. This
in turn presumes an access to huge amounts of previously unlearned topics and how they
are correlated. According to the dynamic factor and for further reasons assigned subse-
quently, the online encyclopedia Wikipedia proved to be the ideal knowledge source.
3.1
Topics Provided by Wikipedia
According to our definition, dialog topics are considered to be categories subordinating
a sequence of dialog contributions. The Wikipedia category system is composed of cat-
egories subordinating articles presented by natural language texts. Utilizing the similar-
ity between utterance-topic relations in dialogs and article-category links in Wikipedia
constitutes the basis for our dynamic topic detection approach. Generally speaking, we
identify a dialog topic by mapping the several utterances to Wikipedia articles and spec-
ifying their shared Wikipedia categories as potential topics. Thus, the detection process
is capable of identifying a topic t without having a priori knowledge of the domain
underlying t .
A big advantage of accessing Wikipedia for our purpose is the fact that its en-
cyclopedic knowledge is constructed collaboratively by numerous volunteers. Hence,
Wikipedia provides huge amounts of information whose maintenance is done by oth-
ers. Furthermore, the resulting description and categorization of concepts reflect the
participants' perception of conceptual structures and delivers insights into the human
understanding of topics and their relations.
3.2
Online Detection
Within our approach, realizing an automatic topic detection mainly involves the imple-
mentation of automatic processes that identify potential topics, track ongoing topics,
detect topical shifts, and label the coherent dialog sequences. To ensure an online work-
ing topic detection the first two tasks need to be performed continuously, that is on every
incoming utterance. Their outcomes simultaneously affect the remaining processes. In
the following, the several tasks are described in more detail. Additionally, Figure 1
gives an overview of the presented topic detection approach and illustrates the relations
between its associated processes.
Identification of Potential Topics. Referring to Schank (1977), an utterance said in
response to an input provides both a conceptual intersection to the present dialog topic
and a new conceptualization introducing potential new topics. Accordingly, to auto-
matically identify potential topic directions, at first every single dialog contribution has
to be conceptualized by identifying its contained concept terms . Therefore, the system
first preprocesses the present utterance by means of the Stanford Part-Of-Speech Tagger
[12]. Afterwards, all identified nouns and proper nouns are specified as concept terms.
Moreover, the system extracts the verbs contained in the present utterance and trans-
forms them to their substantive as providing potential conceptual information as well.
Therefore we make use of the online dictionary Wiktionary . Then, the system searches
for a Wikipedia article giving a concept description for the substantive. If a correspond-
ing article can be found, as for example given for the term “swimming” , the substantive
 
Search WWH ::




Custom Search