Digital Signal Processing Reference
In-Depth Information
manent speaker characteristics, as were also featured in the 2010 Paralinguistic
Challenge for age and gender determination. As further speaker trait task and an
example of interdependence of speaker traits, we further consider an application
where the height of a speaker is inferred from the voice [ 76 ] (not included in any of
the challenges). The final example of paralinguistics stems from the INTERSPEECH
2011 Speaker State Challenge [ 77 ]. The tasks at hand—speaker intoxication and
speaker sleepiness—are found somewhat in-between states and traits on a temporal
scale, as they are either 'long-term states' or 'short-term traits'.
Looking at applications of such speaker state and trait information, the following
are found among the most promising:
First, it seems obvious that speech recognition and interpretation of speakers'
intention can benefit from paralinguistic information [ 78 ], e.g., when trying to recog-
nise equivocation [ 79 ]. The information can also be exploited in the acoustic layer
to improve recognition of 'what' has been said, e.g., by adaptation of the acoustic
model [ 39 , 80 - 84 ].
Next, conversation analysis, mediation, and transmission can benefit from
paralinguistics, such as in computer-aided analysis of human-human conversations
including the investigation of synchrony in the prosody of married couples [ 3 ], spe-
cific types of discourse [ 85 ] in psychology, or the analysis and summarisation of
meetings [ 86 , 87 ].
Many applications also exist in the public health sector. Hearing-impaired persons
can profit, as cochlear implant processors typically alter the spectral cues which are
crucial for the perception of paralinguistic information [ 88 ]. Children with autism
may profit from the analysis of emotional cues as they may have difficulties under-
standing or displaying them [ 89 , 90 ].
Also, transmitting paralinguistic information along with other message elements
can be used to animate avatars [ 91 ], to enrich dictated text messages, or to label
calls in voice mailboxes by symbols such as emoticons [ 92 ]. Communicative virtual
agents and robots should be enriched by social competence [ 93 - 96 ] which requires
them to understand paralinguistic information from the voice, face, and gestures.
It is also believed that adapting to callers in a voice portal is of commercial interest
[ 97 ], including target-group specific advertising. In call centres, also quality man-
agement by monitoring agents is being researched [ 98 ]. Other applications include
serious gaming and fun applications, such as the love detector by Nemesysco Ltd. 2 or
the game “Truth or Lies—Someone Will Get Caught” for video consoles that comes
with a microphone and claims to detect lies (THQ ® Entertainment 3 ), further health
related applications such as monitoring elderly people living on their own [ 99 ]or
diseases and speech disorders [ 100 ] such as Parkinson's disease [ 101 , 102 ], autism
[ 103 ], cancer, cleft lip and palate [ 104 ] or dysphonia [ 105 ], or further pathological
effects [ 106 ].
Tutoring systems are another typical field of application, where information on
user states such as uncertainty [ 107 ], interest, stress, cognitive load [ 108 ], or even
2
http://www.nemesysco.com/
3
http://www.thq.com/
Search WWH ::




Custom Search