Graphics Reference
In-Depth Information
human communication. Communicative acts can range from a spoken
word to a segmented gesture (e.g., start and end time of a pointing)
or a prosodic act (e.g., region of low pitch).
To improve the expressiveness of these communicative acts, the
idea of encoding dictionary was proposed. Since communicative
acts are not always synchronous, they are allowed to be represented
with various delay and length. In the experiments with backchannel
feedback, 13 encoding templates were identified to represent a
wide range of ways that speaker actions can influence the listener
backchannel feedback. These encoding templates will help to represent
long-range dependencies that are otherwise hard to learn using
directly a sequential probabilistic model (e.g., when the influence
of an input feature decays slowly over time, possibly with a delay).
An example of a long-range dependency will be the effect of low-
pitch regions on backchannel feedback with an average delay of 0.7
seconds (observed by Ward and Tsukahara (2000)). In the prediction
framework, the prediction model will pick an encoding template with
a 0.5 seconds delay and the exact alignment will be learned by the
sequential probabilistic model (e.g., Latent-Dynamic CRF) which will
also take into account the influence of other input features. The three
main types of encoding templates are:
￿ Binary encoding: This encoding is designed for speaker features
directly synchronized with listener backchannel.
￿ Step function: This encoding is a generalization of binary encoding
by adding two parameters: width of the encoded feature and
delay between the start of the feature and its encoded version.
This encoding is useful if the feature influence on backchannel
is constant but with a certain delay and duration.
￿ Ramp function: This encoding linearly decreases for a set period
of time (i.e., width parameter). This encoding is useful if the
feature influence on backchannel is changing over time.
It is important to note that a feature can have an individual influence
on backchannel and/or a joint influence. An individual influence
means the input feature directly influences listener backchannel. For
example, a long pause can, by itself, trigger backchannel feedback
from the listener. A joint influence means that more than one feature
is involved in triggering the feedback. For example, saying the word
“and”' followed by a look back at the listener can trigger listener
feedback. This also means that a feature may need to be encoded more
than one way since it may have an individual influence as well as one
or more joint influences.
Search WWH ::




Custom Search