Biomedical Engineering Reference
In-Depth Information
Between recognition and synthesis, an intelligent system needs to process language, cross-relate
language to vision and other senses (a task known as multimodal sensor fusion), and make decisions
about how to act in this world. Many labs tackle this problem with natural language as the nexus of
the above, an approach known as NLP.
Some NLP researchers ambitiously attempt to completely model human grammars, while others
such as the Cyc project of Austin, Texas model ontological relationships into expert systems — an
approach that has proven successful for some limited applications. Many functional natural
language applications, such as electronic ticketing agents or IBM's Natural Language Assistant
(NLA) search engine (Chai et al., 2002), compensate for their inability to understand full, general
language by relying on the constraints specific to the application's situations. Other ambitious
language-engine projects attempt to model the emergence of language — the paths by which one (a
human or a machine) can acquire language from a social environment.
Under the hypothesis that language is inherently an emergent phenomenon, Luc Steels and other
researchers at the Sony Computer Science Lab in Paris are teaching Sony AIBO robots to recognize
objects via natural language games (Steels and Kaplan, 2002; Boyd, 2002). The results are
promising. While these robots are learning only the simplest of grammars and words, they are
doing so under highly variable conditions, and can recognize learned objects independent of
lighting or viewing angle. In fact, this method has considerably outperformed other language
acquisition systems that used neural networks or symbolic learning labeling theory (Steels and
Kaplan, 2002). Here emphasis is made that such a natural language system is an integration of many
cognitive components: vision, gesturing, pattern recognition, speech analysis and synthesis, con-
ceptualization, interpretation, behavioral recognition, action, etc.
6.3.1.2
Vision, Other Sensing, Sensor Fusion
The work of computational neuroscientist Cristoph von der Malsberg's theories of complex, nonlinear
behavior in neurons has driven the development of numerous successful vision algorithms (Von der
Malsberg and Schneider, 1986). One descendant of Von der Malsberg's work developed by Mals-
berg's student Hartmut Nevin, stands out as the most successful tracker of human facial expressions
from live streaming video is sold as NevenVision FFT. NevenVision modules use these theories to
accomplish numerous other vision tasks as well, including biometric face recognition, object, and
gesture recognition as well. The author of this chapter is currently investigating the use of this software
to endow social robots with emotional-expression recognition in context-driven conversation.
The automated face analysis (AFA) software system developed in the Carnegie Mellon Univer-
sity Face Lab determines the emotional state of a subject by automatically analyzing images against
Ekman's facial action coding system (FACS) (Xiao et al., 2002). While this AFA FACS analysis is
not in real time, if optimized and integrated with quick and robust expression recognition software,
this software will greatly advance progress toward complete and effective sociable robot systems.
Using the work of Steels and Kaplan (2002) described in section 6.3.1.1 and others, Sony has
demonstrated the integration of many visual and perceptual systems and speech in its Qrio biped.
The Qrio can biometrically identify a face, recognize, and respond to a person's facial expressions,
and recognize objects and environmental attributes. The visual ontologies are fused with the
semantic language ontologies, allowing Qrio to converse in a simple but lifelike way about a
number of subjects. This work is a forerunner of integrated machine intelligence systems with
nimble humanlike embodiment.
6.3.2
Social Intelligence, Social Robots, and Robot Visual Identity
Social robots particularly require the fusion of many perceptual, language, and physical embodi-
ment systems — requirements that drive the systematic integration of these components into a
whole that is greater than the sum of parts (Breazeal, 2002).
Search WWH ::




Custom Search