Information Technology Reference
In-Depth Information
10.1.3. Oral dialogue methodologies
More specifically on oral MMD, a certain number of methods have been
suggested [ANT 99, DEV 04, DYB 04, WAL 05, MÖL 07, KÜH 12]. These
make up a sort of reference framework that contains recommendations to
implement user interaction test methods to automatically analyze or
semi-automatically analyze the obtained interaction traces, markers to
determine the assessment metrics or even the principles to create and analyze
the questionnaires filled out by the users. We thus find a few of the methods
used by MMI. Each system assessor can thus pick among this stock to
determine the method(s) he will apply. Indeed, a single test seems to be
insufficient and a genuine assessment seems to need to bring several tests
together. The evaluation campaigns (Evalda/Media: assessment methodology
for understanding within and outside of the dialogue context), the work
groups (MadCow group, speech understanding group, GdR I3) and the
various European project consortia widely use this principle. When several
systems are involved and the assessment is comparative, the operational rules
can be defined so as to better control the assessment quality. The challenge
assessment campaign with its management crossed with the designer roles of
the systems involved [ANT 03] is an example.
The methodology's main propositions are each accompanied with an
original idea that is meant to simplify the implementation of a type of test by
providing it with a means to be operationalized in a specific context. The
paradigm of the MadCow group [HIR 92] thus provides us with the notion of
template that characterized the minimum and maximum answers to a query
and thus make its assessment more rigorous. The Paradise paradigm,
Paradigm for Dialogue System Evaluation [WAL 01], focuses on the
maximization of the user's satisfaction and suggests to try and satisfy the task
as a reference. Another original idea example [LÓP 03] suggests assessing a
system by automatically generating test user utterances, that is modeling the
user's behavior, including his mistakes. In France, this method was taken up
in the Simdial paradigm [ALL 07], in which the deterministic simulation of a
user allows us to automatically assess the system's dialogueic abilities,
notably thanks to the notion of disturbing phenomenon, which, like the noise
in the Wizard of Oz of [RIE 11], allows us to introduce protestations or
rephrasing requests that will allow us to assess the system's general behavior
and robustness. Moreover, the Data-Question-Response (DQR)
methodology, see notably the chapter by J. Zeiliger et al. in [MAR 00],
Search WWH ::




Custom Search