Information Technology Reference
In-Depth Information
One concept term can be related to more than one topic although in various extents.
Within our approach, the automatic assignment of concepts to topics is implemented
by mapping all concept terms to a set of predefined Wikipedia categories. Therefore, a
number of categories from Wikipedia best presenting a set of topics possibly addressed
in the given dialog scenario has to be specified previously. Basically, every category
contained in the Wikipedia category system can be considered to present a potential
dialog topic. But it is advisable to choose those categories having a high degree of
abstraction as best reflecting more general topic areas such as “Sports” or “Politics”.
Subsequently, for every chosen category all subordinated Wikipedia articles are ex-
tracted, that is, all articles assigned to the considered category or to at least one of its
subcategories. Afterwards, the relevant information parts are stored in a second Lucene
index. More precisely, documents for every predefined Wikipedia category including
field specifications about its title as well as information about the titles and textual con-
tents of their subordinated articles are set up. Thereby, articles that are related to one
predefined category several times are contained accordingly often in the category doc-
ument to boost its importance within the presented topic area.
To retrieve a list of categories representing possible topics sorted in descending or-
der according to their relatedness to the concept term cterm we search the index for
each category document d matching cterm in a query q on the basis of the scoring for-
mula presented in equation 2. As a result, each concept term of the present utterance is
represented as a vector within a space of predefined Wikipedia categories constituting
potential dialog topics. For the rest of the paper, we refer to these vectors capturing the
relative importance of the dialog topics for the considered concept term as concept topic
vectors .
Identification of Dialog Topics. As stated before, a dialog topic is established con-
sensually from both conversation participants. That is, a single utterance does not have
topics in isolation but rather provide topic suggestions [7]. Based on this idea we have to
consider at least two successive utterances to define a topical intersection. Accordingly,
the topic tracking process begins with the second dialog contribution.
To detect topical overlaps between two successive dialog contributions, we compare
each of the concept topic vectors specified for one utterance with each of the concept
topic vectors of the subsequent utterance separately using the cosine similarity .Thatis,
we quantify the similarity between two concept terms cterm 1 and cterm 2 of succes-
sive utterances utt 1 and utt 2 on the basis of their concept topic vector representations
V ( cterm 1 ) and V ( cterm 2 ) via
sim ( cterm 1 ,cterm 2 )= V ( cterm 1 )
·V ( cterm 2 )
(3)
|V ( cterm 1 )
||V ( cterm 2 )
|
where cterm 1
utt 2 .
If the comparing process detects a significant similarity between two concept topic
vectors, that is, their similarity is higher than a given similarity threshold (currently set
to 0.5), a topical overlap between utt 1 and utt 2 is identified. For every topical over-
lap, the involved concept topic vectors are summed up resulting in a new vector, called
utt 1 and cterm 2
 
Search WWH ::




Custom Search