Graphics Reference
In-Depth Information
and adjust the pace and content of the lecture accordingly. In a typical
face-to-face conversation, where all participants have equal right to
speak, this two-way interaction becomes intense.
Previous work based on the JST/ESP 1500-hour corpus of highly
interactive natural speech recorded through head-worn microphones
by volunteers in everyday situations has revealed significant changes,
particularly in voice quality and tone-of-voice, depending on the nature
of the relationship with the interlocutor and the various stages of the
interaction. We generalize from this to infer that in daily interactive
speech social relationship information is encoded into every utterance
as part of the vocal setting and that prosodic variation (which includes
voice quality) serves not only to carry linguistic and grammatical
information but also to display social and cognitive states, and to signal
the changing social relationships between the participants. This is what
human listeners are accustomed to processing and what is missing
from the stream of information in computer-synthesized speech.
Speakers in an interactive conversation are accustomed to
constantly monitor the attentive and cognitive states of the listener
(if such a term may still be used for the active co-participant) and to
adjust their speech behavior accordingly. In addition to processing the
propositional or lexical content of each utterance, the conversation
participants also employ a different 'grammar' to process social
information carried alongside that linguistic information by the
prosody (Campbell, 2008a, 2008b).
4. 'Niblets' and Conversational Fragments
In this section, we look more closely at the issue of fragmentation in
interactive speech and introduce the concept of niblets , a new term
proposed for use in speech processing to describe individual fragments
of meaning. We use an example related to Hi-Fi audio that illustrates
a change in lifestyle and customs that might characterize a feature
of present-day conversational speech which poses a challenge to be
addressed by future technology.
Whereas the previous (before the web) generation had a passion
for Hi-Fi audio, often spending large amounts of money to buy the
'best' amplifiers and speakers on which to listen to their favorite record
albums, the younger generation now takes its music through lower-
quality 'ear-buds', buying or downloading only a single-compressed
track at a time. This trend reflects not only fashion changes and
advances in technology, but also a move away from 'quality' toward
'convenience'. The present generation are still able to buy CD-quality
Search WWH ::




Custom Search