Graphics Reference
In-Depth Information
of face animation applications is emerging from the interdisciplinary Human
Computer Interaction research areas, such as using synthetic face animation
for language training [Cole et al., 1999] and psychological studies [Massaro,
1998]. For these scenarios, the naturalness of face animation can be com-
promised, while certain motions can be exaggerated for application purpose
(e.g. mouth motion for lip-reading applications). Therefore, human subject
perception experiments are usually carried out within the application context,
to evaluate and guide further improvement of the face animation. Massro et
al. [Massaro, 1998] have developed a face modeling and animation system,
called “Baldi”. “Baldi” has been used to generate synthetic stimuli for bimodal
speech perception research. Moreover, it has been applied in language training
for school children [Cole et al., 1999].
For multi-modal human speech perception, a 3-process model was proposed
by Massro et al. [Massaro, 1998]. The three processes are: (1) “evaluation”,
which transforms the sources of information to features; (2) “integration”,
where multiple features are integrated both between modalities and within a
modality; and (3) “decision”, which makes perception decision based on inte-
gration results. For integration, a Fuzzy Logical Model of Perception (FLMP)
was proposed. It is mathematically equivalent to Bayes' theorem, which is
widely used in pattern recognition. FLMP has been shown to be effective in
integrating multiple cues for multi-modal speech perception.
“Baldi” and the FLMP model were used to test hypotheses on bimodal speech
perception. Some important results include:
The auditory and visual channels could be asynchronous to certain degrees
without affecting the performance of speech perception. In Massro's work,
100 ~ 200 ms delays of one channel did not interfere with the performance.
Some words/phonemes allow longer delays than the others. This result
agrees with other researchers' findings. McGrath and Summerfield [Mc-
Grath and Summerfield, 1985] reported that a delay up to 80 ms did not
disrupt performance. Pandey, Kunov and Abel [Pandey et al., 1986] found
an upper limit of 120 ms. A result of 200 ms as the upper limit was reported
by Campbell and Dodd [Campbell and Dodd, 1980].
The speech reading performances are relatively robust under various condi-
tions, including different peripheral views, image resolution, viewing angles
and distances. First, experiments show that speech reading performances
do not degrade even when the perceiver is not looking directly at the mouth.
Second, as the resolution of face image decreases, phonemes with large scale
motion such as / W/ can still be reliably recognized. However, phoneme fea-
turing more detailed motion such as the interdental motion of / TH / , can
get confused with other phonemes. Next, it is also found that speech read-
ing performances degrade little when viewing non frontal faces. Profile
Search WWH ::




Custom Search