Information Technology Reference
In-Depth Information
Table 1. Activity classes used in this work and the objects recognized in each semantic
location context
location
activity
objects
kitchen
walking, standing, cutting, scrambling
worktop, microwave, floor
dining room
walking, standing, sitting, eating
dining table, floor
living room walking, standing, sitting, lying, watching
floor, chair, sofa, TV
study room
walking, standing, sitting, typing
computer, chair, floor
able to recognize activities involving subtle motions, such as cutting, scrambling,
eating, typing etc. The fine-level analysis of activities enables discrimination of
more types of objects in the environment.
To infer objects the relationship between objects and activities needs to be
modeled. Probabilistic graphical models are good candidates for modeling the
relations between objects, user activities and other events. However, such re-
lationships can be quite complex in real applications, and building a graphical
model manually can become intractable as its scale increases. Moreover, a single
inclusion or removal of a variable or a modification of a relation may result in
many changes in the graphical model. It is therefore crucial to employ a frame-
work which can (a) handle such complex relations in an intuitive and scalable
fashion, and (b) model the vision output and high-level deductions in a sta-
tistical way. In this paper we use Markov logic network (MLN) [1] to interface
vision processing outputs and high-level reasoning. MLN can be regarded as
a template to construct Markov networks. The advantage of MLN is that it
intuitively models various relations between objects and user activities in first-
order logic, which serves as the knowledge base for inference. Each formula in
the knowledge base has a weight, representing the confidence associated with it.
With observations, MLN is grounded into a Markov random field (MRF). There-
fore, the probability of variables can be inferred through the MRF. MLN has
been used in event recognition in visual surveillance [2] where its advantage in
accommodating commonsense knowledge into event inference is demonstrated.
The contributions of this work are as follows. (1) We propose to recognize
objects through human activities when the object category has changing ap-
pearance and when the object can be identified through human interaction. This
approach is especially helpful for recognizing objects in a smart home environ-
ment. (2) We demonstrate that fine-level activities in the home environment can
be analyzed and they are effective to differentiate many types of objects. (3) We
propose to use Markov logic network to interface vision and semantic reasoning,
and to encode the relational structure between objects and user activities in our
prior knowledge. The model is capable of handling complex relationships in a
scalable way. Another advantage of MLN over Markov networks is that it can
handle both soft and hard constraints (relationships), which we exploit in our
approach.
The rest of the paper is organized as follows. Sec. 2 summarizes related work
on object recognition and activity classification. The overview structure of our
system is presented in Sec. 3. The hierarchical activity recognition with multiple
 
Search WWH ::




Custom Search