Multimodal Dialogue System Assessment - Man-Machine Dialogue: Design and Challenges

Information Technology Reference

In-Depth Information

10.1.1. A few assessment experiments

Let us take up, as we did in the scenarios of section 3.1, a generic designer:

whether it is the last design stage or the middle stage, our system designer

will inevitably wonder about the issue of assessment. Given the simplifications

that were carried out on the linguistic and pragmatic theories, is the system

performing sufficiently? In general, whether there were any simplifications

or not, did the system have satisfactory reactions? Is each module fulfilling

its role? Is the architecture relevant for the processes carried out? Does the

system's behavior match the original idea that was the basis for the directions

given to the Wizard of Oz?

Obviously, the first idea that comes to mind once the system is operational

is to carry out user tests. It is often the point when the designer's morale is put

to the test: between the subjects that do not understand how to use the

microphone and the related push-to-talk button or pedal; those that do not

manage to use the touch screen; those that do not control their action on the

material and even go so far as to destroy it; those that repeat each gesture

three times or repeat each bit of a sentence three times for fear that the system

might have missed a part of it; those that express themselves so spontaneously

that their sentences are teeming with incisions, relative subordinates,

hesitations or corrections; those that are so intimidated by the system that they

express themselves in a telegraphic style; and mostly those that go beyond the

predefined applicative framework. One word (which had not been imagined at

the beginning) is enough for the despairing “I did not understand...”

statement to be generated, a situation that was experienced when assessing

the Ozone project, with the multimodal train reservation system which

integrates a complete modeling of trains, train stations, timetables, but whose

lexicon did not contain the “tomorrow” that one of the assessors uttered.

Faced with this observation, the designer then carries out a training course

on man-machine interaction with its subjects, on what the spontaneous

specific dialogue is, on the way in which the system operates and especially

on the applicative domain and its scope. In the end, a subject directly

generated valid utterances and the system operated much better. If this is not

the case, a training session (which does not count for the assessment) can be

considered. But then, what about the assessment? By repeating sentence

examples that the system can process, the subjects are obviously led to utter

these same sentences, and it is hard to assess the spontaneous aspect of the

Search WWH ::

Custom Search

Home