Information Technology Reference
In-Depth Information
internal semantic representations. If the procedure appears to be clear, it does,
however, present a certain number of issues linked to multimodality coding.
At the most raw level, a multimodal corpus is a recording of signals
captured by the system: an audio signal captured by the microphone, a
gesture signal tracked by the touch screen, a video signal tracked by one or
more cameras, etc. If such a recording can be ideal to simulate system input,
it does not allow for any data manipulation (here we can think of the
derivation of examples from an initial example, see section 3.2.2) and is hard
to characterize in terms of phenomena. To do this, the corpus annotation, i.e.
the passage from a raw level to an interpreted level, is often an indispensable
operation. Yet, the annotation of a multimodal corpus creates issues due to the
nature of the recorded signals. Contrary to speech that can relatively simply
be transcribed in a relatively simple and unbiased manner into written
sentences, gesture and other modalities cannot be transcribed simply, which
we have seen in section 6.1.3. It remains, however, that these technical issues
should not let us lose sight of the fact that we cannot do without corpus use in
MMD.
10.2.3. Can we compare several multimodal systems?
A third issue concerns the implementation of a comparative assessment
procedure. The point is to compare several dialogue systems with similar
abilities on the same type of application. But in the existing works that limit
themselves to oral dialogue, the comparison focuses rarely on comparing an
oral dialogue system with another type of reference system, for example a
written dialogue system. It would, however, be interesting to assess the
contribution of speech as a source of communication improvement between
the user and his/her machine, or a source of improvement in the task
management efficiency. The question arises especially when it comes to a
multimodal dialogue. Often the multimodal ability is presented as an asset
compared with the linguistic ability: multimodality is presented as being
more efficient, quicker, more precise and direct, especially for referring
actions that allow a direct access to the objects (without going through
complex and potentially ambiguous spatial descriptions). A comparative
assessment procedure should thus include oral systems as well as multimodal
systems. Moreover, and this is especially true in professional fields,
multimodality is also presented as an asset compared to the classical
Search WWH ::




Custom Search