Information Technology Reference
In-Depth Information
utterances in the language. As course developers author scenario dialogs and other
learning materials, the example utterances in these materials are entered into the
database and then used to create the language models. The speech recognition system
runs using the Julius speech recognition decoder, an open-source speech recognition
engine developed under the leadership of Kyoto University [6].
Next, the agent evaluates and interprets the communicative intent of the learner's
utterance and gesture. Communicative intents are represented using a library of
communicative acts, derived from speech act theory originating in the work of Austin
[1], and further developed by Traum and Hinkelman [13]. Each communicative act
has a core function, i.e., the illocutionary function of the utterance (to greet, inform,
request, etc.), and grounding function, i.e., the role of the utterance in coordinating
the conversation (e.g., to initiate, continue, acknowledge, etc.). The grounding
functions help to determine the current dialog context, which in turn can influence
how subsequent utterances are interpreted. At each point in the dialog, the agent is
expecting to hear and respond to one of a set of possible communicative acts, which
changes over the course of the conversation. If the learner says something that is not
appropriate at that stage of the conversation, e.g., greeting a character at the end of the
conversation instead of the beginning, the agent will act as if the learner said
something odd that does not make sense.
Note that the interpretation of the utterance and gesture depends upon the particular
culture being modeled. The mapping from utterances to communicative acts is
specified for each language. Some gestures have meaning only in certain cultures,
e.g., placing the palm of the right hand over the heart in greeting only has meaning in
Islamic countries. Some gestures are appropriate only in some social contexts; for
example, American culture and Arab cultures differ as to when it is acceptable to
shake hands with the opposite sex, or kiss the cheek of someone of the same sex.
Depending upon the type of dialog exercise, the agent may not just evaluate the
appropriateness of the learner's communication, but also identify and classify the
learner's mistakes. Operational Pashto and goEnglish both include so-called mini-
dialog exercises, in which learners practice individual conversational turns with a
non-player character and receive feedback regarding any mistakes they may have
made. Detected errors include grammatical errors, semantic errors (e.g., confusing
words with similar meanings), and pragmatic errors (e.g., inappropriate use of
expressions of politeness, honorifics, etc.).
Once the learner's input is interpreted, the intent planning stage occurs, in which
each agent in the conversation decides how to respond. Intent planning is challenging
because it must address multiple conflicting needs: accuracy, versatility, authorability,
and run-time performance. The agents should choose communicative acts that are
culturally appropriate, e.g., that match the dialog examples created in the cultural data
development process described in section 4. However the agent models cannot simply
follow the example dialogs as scripts, but need to be versatile enough to respond in a
culturally appropriate way regardless of what the learners might say. The agent
modeling language needs to be powerful enough to achieve such versatility, yet be
authorable by instructional designers who lack the computer science background
required for sophisticated agent programming languages. It also is important for the
intent-planning module to have good runtime performance, so that the intent planning
Search WWH ::




Custom Search