Information Technology Reference
In-Depth Information
signal processing, mathematical modeling adapted to the representation of
configurations and trajectories, all that with constraints of execution speed,
precision and abstraction in representations that the system can efficiently
manipulate, so the researcher can confront these representations with those
stemming from automatic utterance understanding. As Bellalem and Romary
[BEL 96] show us, for example, for gesture trajectories carried out on a touch
screen, a representation of a gesture under the shape of a sequence of several
hundreds of positions is simply unmanageable. It is necessary to abstract
regularities and significant instants from it to reach, for example, a curve that
can be described in four or five parameters. If this curve is then used to help
resolve a reference to an object, it will be possible to confront it with a
representation (also simplified) of the visual scene and the objects that appear
in it.
Some processes require specific recording devices, with the immediate
examples of a microphone for processing speech and of the keyboard for
processing writing. Other processes can be carried out in various manners,
from the most troublesome to the most transparent. An example of
troublesome recording is the pointing glove that the user had to put on so the
system can record the position and configuration of his/her hand or the glove
with an exoskeleton required for force feedback. The increasingly common
example of transparent recording is the camera or coupled camera system
that allows the user the freedom to carry out various processes
simultaneously, for example tracking his/her face and detecting the
configuration of his/her hand.
Automatic speech recognition is a field in itself, and its use in MMD
creates additional issues [JUR 09]. The idea is to go from an audio signal to a
transcription according to a code which is more or less close to written
language and requires various data sources, including the following: an
acoustic model, a list of words in the given language, a dictionary of
pronunciations and, the source of almost essential data to increase
performances, a language model. This model is built from statistical corpus
analyses. By bringing the notion of context (one, two or three previous
words), it allows the system to calculate the probabilities and retain the most
probable hypotheses for the word (or other unit) it is currently recognizing. In
the framework of a speech dictation, the language model is built from
calculations carried out on texts taken from literature or the written press. We
maximize the size of these texts so as to refine the language modeling in
Search WWH ::




Custom Search