2.2 Current status
Initially, human tutoring sessions with children were videotaped and exten-
sively analyzed. As a result a set of annotated tapes and transcripts are produced.
These collected data may serve as input to computer learning algorithms that
are attempting to establish a mapping from user's states information to tutor
The setting for the Wizard-of-Oz experiments.
Currently, the integrative human-computer interaction control system is be-
ing developed. The results of the multimodal input and output are displayed
for the sessions operating in a Wizard-of-Oz environment. The environment
is illustrated in Figure 9.2. In this environment, the child and the tutor are in
separate rooms. The student is not aware of the presence of a human tutor.
Instead, the child supposes he or she is interacting with the computer via the
face avatar. The avatar outputs synthesized speech, shows emotional expression
and directs the student's gaze to selected regions. Meanwhile their behaviors
are recorded for further study. On the other hand the instructor can see the
multimodal signal analysis results and initiate appropriate actions. The inter-
faces for the child and the instructor are shown in Figure 9.3. This system is
currently being used in educational/psychological research. Experimenting are
being carried out to classify multimodal input into user state categories. The
classification is beginning to replace manual analysis of some video data.
As shown in Figure 9.3, 3D face tracing and expression recognition tech-
niques are now used as part of the cues to estimate the states information. Other
useful cues include speech prosody, states of the task, and etc. On the other
hand, the synthetic face animation is used as the avatar to interact with the