Recognizing Objects in Smart Homes Based on Human Interaction - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

Recognizing Objects in Smart Homes Based on

Human Interaction

Chen Wu and Hamid Aghajan

AIR (Ambient Intelligence Research) Lab

Stanford University, USA

airlab.stanford.edu

Abstract. We propose a system to recognize objects with a camera net-

work in a smart home. Recognizing objects in a home environment from

images is challenging, due to the variation in object appearances such

as chairs, as well as the clutters in the scene. Therefore, we propose to

recognize objects through user interactions. A hierarchical activity anal-

ysis is first performed in the system to recognize fine-grained activities

including eating, typing, cutting etc. The object-activity relationship is

encoded in the knowledge base of a Markov logic network (MLN). MLN

has the advantage of encoding relationships in an intuitive way with first-

order logic syntax. It can also deal with both soft and hard constraints by

associating weights to the formulas in the knowledge base. With activity

observations, the defined MLN is grounded and turned into a dynamic

Bayesian network (DBN) to infer object type probabilities. We expedite

inference by decomposing the MLN into smaller separate domains that

relates to the active activity. Experimental results are presented with our

testbed smart home environment.

1

Introduction

In this paper we propose a system to recognize objects and room layout through

a camera network in a smart home. Recognizing objects such as table, chair, sofa

etc. in a home environment is challenging. First, many objects such as chairs and

desks have varied appearances and shapes. Second, they are usually viewed from

the cameras from different viewpoints. Third, Cameras installed in rooms often

have a wide field of view. Images are usually cluttered with many objects while

some objects of interest may have small image size. However, many objects are

defined by their functions to users and not necessarily by their appearance. Such

objects can be recognized indirectly from human activities during interaction

with the objects.

In our work objects in the kitchen, dining room, living room and study room

are recognized based on the activity analyzed from the camera network. The

object types and activity classes in each semantic location are listed in Table 1.

We adopt a hierarchical approach for activity recognition, including coarse- and

fine-level activity recognition with different image features. In addition to the

simpler pose-related activities such as standing, sitting and lying, we are also

Search WWH ::

Custom Search

Home