Recognizing Objects in Smart Homes Based on Human Interaction - Advanced Concepts for Intelligent Vision Systems - page 131

Information Technology Reference

In-Depth Information

Table 1. Activity classes used in this work and the objects recognized in each semantic

location context

location

activity

objects

kitchen

walking, standing, cutting, scrambling

worktop, microwave, floor

dining room

walking, standing, sitting, eating

dining table, floor

living room walking, standing, sitting, lying, watching

floor, chair, sofa, TV

study room

walking, standing, sitting, typing

computer, chair, floor

able to recognize activities involving subtle motions, such as cutting, scrambling,

eating, typing etc. The fine-level analysis of activities enables discrimination of

more types of objects in the environment.

To infer objects the relationship between objects and activities needs to be

modeled. Probabilistic graphical models are good candidates for modeling the

relations between objects, user activities and other events. However, such re-

lationships can be quite complex in real applications, and building a graphical

model manually can become intractable as its scale increases. Moreover, a single

inclusion or removal of a variable or a modification of a relation may result in

many changes in the graphical model. It is therefore crucial to employ a frame-

work which can (a) handle such complex relations in an intuitive and scalable

fashion, and (b) model the vision output and high-level deductions in a sta-

tistical way. In this paper we use Markov logic network (MLN) [1] to interface

vision processing outputs and high-level reasoning. MLN can be regarded as

a template to construct Markov networks. The advantage of MLN is that it

intuitively models various relations between objects and user activities in first-

order logic, which serves as the knowledge base for inference. Each formula in

the knowledge base has a weight, representing the confidence associated with it.

With observations, MLN is grounded into a Markov random field (MRF). There-

fore, the probability of variables can be inferred through the MRF. MLN has

been used in event recognition in visual surveillance [2] where its advantage in

accommodating commonsense knowledge into event inference is demonstrated.

The contributions of this work are as follows. (1) We propose to recognize

objects through human activities when the object category has changing ap-

pearance and when the object can be identified through human interaction. This

approach is especially helpful for recognizing objects in a smart home environ-

ment. (2) We demonstrate that fine-level activities in the home environment can

be analyzed and they are effective to differentiate many types of objects. (3) We

propose to use Markov logic network to interface vision and semantic reasoning,

and to encode the relational structure between objects and user activities in our

prior knowledge. The model is capable of handling complex relationships in a

scalable way. Another advantage of MLN over Markov networks is that it can

handle both soft and hard constraints (relationships), which we exploit in our

approach.

The rest of the paper is organized as follows. Sec. 2 summarizes related work

on object recognition and activity classification. The overview structure of our

system is presented in Sec. 3. The hierarchical activity recognition with multiple

Next Page

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home