Graphics Reference
In-Depth Information
in spite of what is now a high level of technical competence in both
synthesis and recognition components.
3. Use of 'Social Prosody' in a Conversation
Much of the social or interpersonal information in speech is carried
by the prosody and signaled by changes in the intonation, loudness,
rhythm, and tone-of-voice of the speaker. It is also carried by the
numerous backchannel utterances that intersperse a conversation
to show listener feedback. In our JST/ESP corpus of 1500 hours of
everyday conversational interaction (www.speech-data.jp), these
short nonverbal interjections accounted for more than half of the total
number of utterances. These words, like 'ah' and 'um', 'yeah' and
'yeah yeah yeah', are characterized both by phonetic simplicity and
prosodic complexity, perhaps serving principally to carry tone of voice
information simultaneously signaling both speaker affect and relation
to the interlocutor (Campbell and Mokhtari, 2003).
The study of speech prosody has a long history, but much of the
science to date has focused on the relations between the intonation
of syntactic elements in a sentence—i.e. on linguistic content. More
recently, however, the social aspects of spoken interaction have begun
to be studied from a prosodic point of view, and it has been shown that
prosody functions not only to signal the structure and relationships
of morphological, syntactic, and semantic aspects of propositional
content but also simultaneously serves as a messenger for affective
and cognitive information related to speaker participation status in a
discourse and inter-participant relationships (Campbell, 2007).
Traditional studies have been based on read speech. Read speech
and broadcast speech stand in contrast to conversational speech in
that they function primarily to convey text-based information to an
audience that is largely passive. They are one-way processes, as is
present-day dialogue technology. In the case of radio broadcasts, for
example, no real-time feedback from the audience is even possible and
the speaker has no need to take any observable cognitive states of the
listener into account when rendering text as speech. The task is simply
to render the content intelligibly (content that was originally created as
text, and which through its complexity presents a particular prosodic
challenge to the broadcaster, who is therefore usually a highly trained
performer). In the case of a public lecture, however, the audience
may be visible, while passively listening with no right to speak, but
an effective presenter will take into account such cues as small head
movements and facial expression changes that signal understanding,
Search WWH ::




Custom Search