An Emotional Talking Head for a Humoristic Chatbot - Applications of Digital Signal Processing

Image Processing Reference

In-Depth Information

facial parameters to be used for interpolating the preceding keyframe towards the present one

( τ <0). The latter regards the dominance of the next viseme on the parameters used morph the

present keyframe towards the next one ( τ >0). Our implementation doesn't make use of an

animation engine to control the facial parameters (labial opening, labial protrusion and so

on) but the interpolation process acts on the translation of all the vertexes in the mesh. The

prosodic sequence S of time intervals [ t i− 1 , t i [ associated to each phoneme can be expressed as

follows:

S = { f 1 ∈ [ 0, t 1 [ ; f 2 ∈ [ t 1 , t 2 [ ; . . . ; f n ∈ [ t n− 1 , t n [} (4)

A viseme is defined “active” when t falls into the corresponding time interval. The preceding

and the following visemes are defined as “adjacent visemes”. Due to the negative exponential

nature of the dominance function, just the adjacent visemes are considered for computing

weights. For each time instant, 3 weights must be computed on the basis of the respective

dominance functions of 3 visemes at a time. The weights are computed as follows:

w i (t) = D i (t) = α i exp (− θ i |t − τ i |)

(5)

where τ i the mid point of the i -th time interval. The w i must be normalized:

w i (t)

w ′

i (t) =

(6)

+ 1

∑

j =− 1

w i−j (t)

so that for each time instant the coordinates of the interpolating viseme vertexes v (l)

int ( t ) ∈

{V int (t)} will be computed as follows:

i+ 1

∑

k = i − 1

v ( l )

w i ′ (t) v ( l k (t)

int (t) =

(7)

where the index l indicates corresponding vertexes in all the involved keyframes.

Our implementation simplifies also this computation. It is sufficient to determine the result of

the coarticulation just for the keyframes, because the interpolation is obtained using directly

the morphing engine with a linear control function. Once the dominance functions are

determined, each coarticulated keyframe is computed and its duration is the same as in the

corresponding phoneme.

4.2.2 Diphthongs and dominant visemes

A sequence of two adjacent vowels is called diphthong. The word “euro” contains one

diphthong. The vowels in a diphthong must be visually distinct as two separate entities.

The visemes belonging to the vowels in a diphthong mustn't influence each other. Otherwise,

both the vowel visemes wouldn't be distinguishable due to their fusion. In order to avoid this

problem, the slope of the dominance function belonging to each vocal viseme in a diphthong

must be very steep (see Fig.2). On the contrary, the sequence vowel-consonant requires a

different profile of the dominant function. Indeed, the consonant is heavily influenced by the

preceding vowel: a vowel must be dominant with respect to the adjacent consonants, but not

with other vowels. As shown in Fig.3, the dominance of a vowel with respect to a consonant

is accomplished with a less steep curve than the consonant one.

Applications of Digital Signal Processing

Search WWH ::

Custom Search

Home