An Emotional Talking Head for a Humoristic Chatbot - Applications of Digital Signal Processing

Image Processing Reference

In-Depth Information

models have been developed. An interesting overview can be found in Ortony (1997). Among

the models cited in Ortony (1997), the model by Ekman have been chosen as basis for our

work. According to Ekman's model, there are six primary emotions: anger, disgust, fear, joy,

sadness, surprise. We have developed a reduced version of this model, including only three

of the listed basic emotions: anger, joy, sadness. We selected them as basis to express humor.

At this moment our agent is able to express one of these three emotions at a time, with a

variable intensity level. The emotional state of the agent is represented by a couple of values:

the felt emotion, and its corresponding intensity. The state is established on the basis of the

humor level detected in the conversation. As just said, there are only three possible values

for the humor level. These levels have to correspond to a specific emotion in the chatbot,

with an intensity level. The correspondence should to be defined according to a collection

of psychological criteria. At this moment, the talking head has a predefined behavior for its

humorist attitude useful to express these humor levels. Each level is expressed with a specific

emotion at a certain intensity level. This emotional patterns represent a default behavior for

the agent. The programmer can create a personal version of emotional behavior defining

different correspondences between humor levels and emotional intensities. Moreover, he can

also program specialized behaviors for single steps of the conversation or single witticisms,

as exceptions to the default one.

The established emotional state has to be expressed by prosody and facial expressions. Both

of them are generated by the EMOTIONAL AREA . This task is launched by AD HOC AIML tags.

4. EHeBby talking head

Our talking head is conceived to be a multi-platform system that is able to speak several

languages, so that various implementations have been realized. In what follows the

different components of our model are presented: model generation, animation technique,

coarticulation, and emotion management.

4.1 Face model generation

The FaceGen Modeler FaceGen (2010) has been used to generate graphic models of the 3D

head. FaceGen is a special tool for the creation of 3D human heads and characters as polygon

meshes. The facial expressions are controlled by means of numerical parameters. Once

the head is created, it can be exported as a Wavefront Technologies .obj file containing the

information about vertexes, normals and textures of the facial mesh. The .obj is compliant

with the most popular high level graphics libraries such as Java3D and OpenGL. A set of

faces with different poses is generated to represent a “viseme”, which is related to a phoneme

or a groups of phonemes. A phoneme is the elementary speech sound, that is the smallest

phonetic unit in a language. Indeed, the spoken language can be thought as a sequence of

phonemes. The term “viseme” appeared in literature for the first time in Fischer (1968) and

it is equivalent to the phoneme for the face gesture. The viseme is the facial pose obtained

by articulatory movements during the phoneme emission. Emotional expressions can be

generated by FaceGen also. In our work we have implemented just 4 out of the Ekman basic

emotions Ekman & Friesen (1969): joy, surprise, anger, sadness. The intensity of each emotion

can be controlled by a parameter or mixed to each other, so that a variety of facial expressions

can be obtained. Such “emotional visemes” will be used during the animation task. Some

optimizations can be performed to decrease amount of memory necessary to store such a set

of visemes. Just the head geometry can be loaded from the .obj file. Lights and virtual camera

parameters are set within the programming code. A part of the head mesh can be loaded as

a background mesh and after the 3 sub-meshes referred to face, tongue and teeth are loaded.

Applications of Digital Signal Processing

Search WWH ::

Custom Search

Home