Recognizing Objects in Smart Homes Based on Human Interaction - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

4 Activity Analysis in the Camera Network

The major challenges for activity recognition in the home environment include:

1. The person is often occluded by furniture; 2. Since the person freely moves and

turns around while the cameras are static, the cameras may not always have a

good viewpoint to observe the activity; 3. Activities in the home can have quite

disparate characteristics. While activities such as lying can be distinguished from

the pose, the kitchen activities usually have simple poses with subtle hand mo-

tions; 4. A fusion mechanism is needed either at the feature or the decision level.

As the whole activity recognition system, we use a hierarchical approach to

classify user activities with visual analysis in a two-level process. Different types

of activities are often represented by different image features, hence attempting

to classify all activities with a single approach would be ineffective. In Fig. 1(b),

activities are represented by coarse and fine levels. The coarse activity level

includes the classes of standing , sitting and lying , which relate to the pose of the

user. Adding global motion information and face detection, more attributes are

added to standing and sitting to discriminate walking and watching in the second

level. The fine activity level also consists of activities involving movement such

as cutting , eating , reading , etc. We apply such a hierarchical approach because

the first-level activities are discriminated based on pose, while the second-level

activities are classified based on motion features.

In the first level, activity is coarsely classified into standing , sitting and ly-

ing with temporal conditional random field (CRF), through employing a set of

features consisting of the height of the user (through 3D tracking) and the as-

pect ratio of the user's bounding box. Details of the process and performance

evaluation can be found in [14].

Based on the result of the coarse level, the activity is further classified at the

fine-level based on several image features. The local motion related activities

are recognized based on spatio-temporal features [15]. A codebook of size N is

constructed with K-means clustering on a random subset of all the extracted

spatio-temporal features of the training dataset. Each feature is assigned to the

closest cluster in Euclidean distance. The video sequences are segmented into

episodes with duration of t seconds. Bag-of-features (BoF) are collected for every

episode, therefore each episode has the histogram of spatio-temporal features as

its feature vector. We use discriminative learning with SVM. Note that we also

have others as an activity category in the experiments. This is because our

sequences are not specifically designed for the defined activity types. There are

many observations where the activities are in transition phase or the person

is simply doing some activities at random which are not within our defined

categories. This is also a challenge for our activity recognition algorithm, since

due to the fact that others includes many different motions, the feature space

for others is complex. However, the applications built on top of activity analysis

discussed in this paper are less sensitive to false positives on others , because the

system is usually designed to perform no operation when the user's activity is

not specific. Details of the experiments and performance can be found in [16].

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home