Information Technology Reference
In-Depth Information
act, with, for example a specific intensity for a “telling to”, human factors and
individual preferences expressed by the user. The highlighting in itself can
take on a variety of shapes, especially for a message in natural language. It
can involve information structure, syntactic constructions such as cleft
sentences and topicalizations, and a variety of prosodic processes (a pause
before and after the emphasized element, for example). At the level of ECA
management, specific renditions are implemented. When it comes to data
such as a geographical map or a table of numbers, any use of colors, element
relative size and Gestalt theory criteria are to be considered.
In a natural dialogue in natural language, the rendition of the system's
voice is an essential aspect, which can discourage the user as we saw in
section 1.3.2 with the uncanny valley issue. In general, and in a rather
schematic way, we actually get a system to pronounce any kind of text or
utterance with a quality close to that of a human who does not understand the
semantic content. We still lack a better ability to take contextual aspects into
account, for example the rendition of nuances, which reveal a fine
understanding. Chaudiron [CHA 04] shows that the evolution of the
text-to-speech system has gone down the following path: type 1 systems, able
to regurgitate prerecorded messages; type 2 systems, managing simple
messages built with a set vocabulary through concatenation methods; type 3
systems, able to carry out a genuine synthesis from a text; and type 4 systems
with a visual component. The paradox of text to speech used in MMD is that
often this process is seen as a module to be run at the end of the chain, i.e.
little concerned with the prosodic and linguistic analysis processes; when the
current realizations show that to pronounce an utterance correctly, the system
must understand its meaning and master prosody so that it can, if needed,
align itself with the user. The phenomena of accentuation, prosodic
prominence, intonation and rhythm are part of the preoccupation in text to
speech (see Chapter 11 of [COH 04]). We have emphasized how much the
phase of how to say it was important in a dialogue, for reasons of coherence
and cohesion. Moreover, once the word sequence making up the message in
natural language has been decided upon, it is mostly prosody that will allow
us to control its synthesis. Theune [THE 02] shows, for example, that a deep
specification of prosodic directives is essential and suggests a generation
model that is not necessarily directed at dialogue, in which several processes
happen in a chain to enrich the text to be spoken into an annotated text, which
will then serve for the text-to-speech phase.
Search WWH ::




Custom Search