Graphics Reference
In-Depth Information
(resulting from the fusion of several modalities), confidence scores,
timestamps as well as incompatible interpretations (“one-of”). Johnston
(2009) presents a variety of multimodal interfaces combining speech-,
touch- and pen-based input that have been developed using the EMMA
standard.
3.3 Choice of segments to be considered
in the fusion process
Most systems start from the assumption that the complete input
provided by the user can be integrated. Furthermore, they presume
that the start and end points of input in each modality are given, for
example, by requiring the user to explicitly mark it in the interaction.
Under such conditions, the determination of processing units to be
considered in the fusion process is rather straightforward. Typically,
temporal constraints are considered to find the best candidates to
be fused with each other. For example, a pointing gesture should
occur approximately at the same time as the corresponding natural
language expression while it is not necessary that the two modalities
temporally overlap. However, there are cases when such an assumption
is problematic and may present a system from deriving a semantic
interpretation. For example, the input components may by mistake
come up with an erroneous recognition result that cannot be integrated.
Secondly, the user may unintentionally provide input, for example,
by making a gesture that should not be taken as a gesture. In natural
environments where users freely interact, the situation becomes even
harder. Users permanently move their arms, but not every gesture is
meant to be part of a system command. If eye gaze is employed as a
means to indicate a referent, the determination of segments becomes
even challenging. Users tend to fixate the objects with the eye they
refer to. However, not every fixation is supposed to contribute to a
referring expression. A first approach to solve this problem has been
presented by Sun et al. (2009). They propose a multimodal input fusion
approach to flexibly skip spare information in multimodal inputs that
cannot be integrated.
3.4 Dealing with imperfect data in the fusion process
Multimodal interfaces often have to deal with uncertain data. Individual
signals may be noisy and/or hard to interpret. Some modalities may
be more problematic than others. A fusion mechanism should consider
these uncertainties when integrating the modalities into a common
semantic representation.
Search WWH ::




Custom Search