Information Technology Reference
In-Depth Information
desirable for voice interfaces. To find out what happens when voice interfaces present emotionally
inconsistent paralinguistic and linguistic cues, a telephone-based experiment was conducted.
Emotional Consistency of Voice Interfaces
This study (Nass et al., 2001) compared emotionally consistent para-linguistic cues and content with
emotionally inconsistent ones; participants all heard the exact same male synthetic voice and same
content. Participants called in to a phone system that read them three news stories about clearly
happy or sad events (e.g., a new cure for cancer or dead gray whales washing ashore on the San
Francisco coast, respectively). Based on reports from a follow-up questionnaire, the participants
found that happy stories sounded happier when read by a happy voice, sad stories sounded sadder
when read by a sad voice. Consistent with the earlier results, participants liked the stories more when
they were told by emotionally consistent voices than by voices that failed to match the emotion of the
story, even when the voices were clearly synthetic and obviously did not reflect “true” emotion.
It appears that humans are so readily wired for picking up emotional information from speech,
that people perceive emotions even in computer-generated speech (Nass and Brave, 2005). To hear
emotion, one usually uses a person's paralinguistic cues of pitch range, rhythm, and amplitude or
duration changes (Ball and Breese, 2000; Scherer, 1981; Scherer, 1986; Scherer, 1989). People
seem to integrate those cues with the spoken content when understanding messages, even when
they are coming from virtual voices.
Implementing Emotionally Consistent Voices
When computer voices are recorded from human actors, it is relatively easy for the actors to infer
emotional meaning from the script so they do not have the problems that computer-generated
speech does in maintaining consistency of emotional paralinguistic cues with emotional content.
Given that humans can infer emotion so easily from written text, it seems as if computers would be
smart enough to do the same, and could crank out emotionally consistent readings even more effi-
ciently than human actors, but the problem of deducing emotion is a much more complex one than
it seems (Picard and Cosier, 1997).
Casting Voice Emotions Within Constraints
Given the difficulty of properly casting appropriate emotional paralinguistic cues for each utter-
ance in a computer interface, it is safest to cast a slightly happy voice (Gong and Nass, 2003)
because humans have a “hedonic preference,” which means that they tend to experience, express,
and perceive positive emotions rather than negative ones. People who show more positive emotions
are liked more (Frijda, 1988; Myers and Diener, 1995), perceived as more attractive, and perceived
as appealing to work with (Berridge, 1999). Happy emotions are not always the best choice though.
Humans pay more attention to and remember more about times of anger and sadness than times of
happiness (Reeves and Nass, 1996). Submissive emotions such as fear and sadness are more likely
to increase trust since the expression of such emotions are seen as emotional disclosure (Friedman
et al., 1988), opening up to the listener, which often causes him or her to reciprocate with disclo-
sure (Moon, 2000). Depending upon the goals of the computer interface, one type of emotional
voice setting could be sufficient. Those designers who are creating both content and voice casting
at the same time may generate emotional voices that match the emotion of the content provided by
the computer interface (Nass and Brave, 2005).
Search WWH ::




Custom Search