Database Reference
In-Depth Information
2.4 General-Purpose Interfaces and the Language Channels
In recognition, language interpretation is embedded into nonlanguage recogni-
tion. For example, in the visual modality humans can recognize nonlanguage 21
input such as streets, houses, cars, trees, and other agents as well as language
input such as shop signs and text in newspapers, books, and letters. Similarly
in the auditory modality: humans can recognize nonlanguage input such as the
sounds made by rain, wind, birds, or approaching cars, as well as language
input such as speech from other agents, radio, and television.
In action, language production is likewise embedded into nonlanguage ac-
tion. For example, humans can use hands for generating nonlanguage output,
such as filling the dishwasher or for drawing a picture, as well as for language
production in the visual modality, such as writing a letter or typing something
on a computer keyboard. Similarly in the auditory modality: humans can use
their voice for nonlanguage output, such as for singing without words or for
making noises, as well as for language production, i.e., for speech.
Such embedding of language recognition and synthesis into nonlanguage
recognition and action cannot be handled by today's technologies of speech
recognition and optical character recognition. Instead, they use input and out-
put channels dedicated to language. This constitutes a substantial simplifica-
tion of their respective tasks (smart solution). 22
Despite this simplification and well-funded research with many promises
and predictions over several decades, automatic speech recognition has not yet
reached its goal. 23 For proof, we do not have to embark upon any argument,
but simply point to the ever-increasing number of keyboards in everyday use:
if automatic speech recognition worked in any practical way, i.e., comparably
to the speech recognition capability of an average human, 24
few users would
prefer the keyboard and screen over speech.
For building a talking robot like C3PO any time soon, insufficient auto-
matic speech recognition presents an - almost certainly temporary - obstacle.
21 We prefer the term nonlanguage over nonverbal because the latter leads to confusion regarding the
parts of speech, e.g., nouns, verbs, and adjectives.
22 For the distinction between smart and solid solutions, see FoCL'99, Sect. 2.3
23 Optical character recognition, in contrast, is quite successful, especially for printed language, and
widely used for making paper documents available online.
24 Technically speaking, a practical system of speech recognition must fulfill the following requirements
simultaneously (cf. Zue et al. 1995):
1. speaker independence,
2. continuous speech,
3. domain independence,
4. realistic vocabulary,
5. robustness.
Search WWH ::




Custom Search