Talking Topically to Artificial Dialog Partners: Emulating Humanlike Topic Awareness in a Virtual Agent - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

One concept term can be related to more than one topic although in various extents.

Within our approach, the automatic assignment of concepts to topics is implemented

by mapping all concept terms to a set of predefined Wikipedia categories. Therefore, a

number of categories from Wikipedia best presenting a set of topics possibly addressed

in the given dialog scenario has to be specified previously. Basically, every category

contained in the Wikipedia category system can be considered to present a potential

dialog topic. But it is advisable to choose those categories having a high degree of

abstraction as best reflecting more general topic areas such as “Sports” or “Politics”.

Subsequently, for every chosen category all subordinated Wikipedia articles are ex-

tracted, that is, all articles assigned to the considered category or to at least one of its

subcategories. Afterwards, the relevant information parts are stored in a second Lucene

index. More precisely, documents for every predefined Wikipedia category including

field specifications about its title as well as information about the titles and textual con-

tents of their subordinated articles are set up. Thereby, articles that are related to one

predefined category several times are contained accordingly often in the category doc-

ument to boost its importance within the presented topic area.

To retrieve a list of categories representing possible topics sorted in descending or-

der according to their relatedness to the concept term cterm we search the index for

each category document d matching cterm in a query q on the basis of the scoring for-

mula presented in equation 2. As a result, each concept term of the present utterance is

represented as a vector within a space of predefined Wikipedia categories constituting

potential dialog topics. For the rest of the paper, we refer to these vectors capturing the

relative importance of the dialog topics for the considered concept term as concept topic

vectors .

Identification of Dialog Topics. As stated before, a dialog topic is established con-

sensually from both conversation participants. That is, a single utterance does not have

topics in isolation but rather provide topic suggestions [7]. Based on this idea we have to

consider at least two successive utterances to define a topical intersection. Accordingly,

the topic tracking process begins with the second dialog contribution.

To detect topical overlaps between two successive dialog contributions, we compare

each of the concept topic vectors specified for one utterance with each of the concept

topic vectors of the subsequent utterance separately using the cosine similarity .Thatis,

we quantify the similarity between two concept terms cterm 1 and cterm 2 of succes-

sive utterances utt 1 and utt 2 on the basis of their concept topic vector representations

V ( cterm 1 ) and V ( cterm 2 ) via

sim ( cterm 1 ,cterm 2 )= V ( cterm 1 )

·V ( cterm 2 )

(3)

|V ( cterm 1 )

||V ( cterm 2 )

|

where cterm 1 ∈

utt 2 .

If the comparing process detects a significant similarity between two concept topic

vectors, that is, their similarity is higher than a given similarity threshold (currently set

to 0.5), a topical overlap between utt 1 and utt 2 is identified. For every topical over-

lap, the involved concept topic vectors are summed up resulting in a new vector, called

utt 1 and cterm 2 ∈

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home