Information Technology Reference
In-Depth Information
4 Activity Analysis in the Camera Network
The major challenges for activity recognition in the home environment include:
1. The person is often occluded by furniture; 2. Since the person freely moves and
turns around while the cameras are static, the cameras may not always have a
good viewpoint to observe the activity; 3. Activities in the home can have quite
disparate characteristics. While activities such as lying can be distinguished from
the pose, the kitchen activities usually have simple poses with subtle hand mo-
tions; 4. A fusion mechanism is needed either at the feature or the decision level.
As the whole activity recognition system, we use a hierarchical approach to
classify user activities with visual analysis in a two-level process. Different types
of activities are often represented by different image features, hence attempting
to classify all activities with a single approach would be ineffective. In Fig. 1(b),
activities are represented by coarse and fine levels. The coarse activity level
includes the classes of standing , sitting and lying , which relate to the pose of the
user. Adding global motion information and face detection, more attributes are
added to standing and sitting to discriminate walking and watching in the second
level. The fine activity level also consists of activities involving movement such
as cutting , eating , reading , etc. We apply such a hierarchical approach because
the first-level activities are discriminated based on pose, while the second-level
activities are classified based on motion features.
In the first level, activity is coarsely classified into standing , sitting and ly-
ing with temporal conditional random field (CRF), through employing a set of
features consisting of the height of the user (through 3D tracking) and the as-
pect ratio of the user's bounding box. Details of the process and performance
evaluation can be found in [14].
Based on the result of the coarse level, the activity is further classified at the
fine-level based on several image features. The local motion related activities
are recognized based on spatio-temporal features [15]. A codebook of size N is
constructed with K-means clustering on a random subset of all the extracted
spatio-temporal features of the training dataset. Each feature is assigned to the
closest cluster in Euclidean distance. The video sequences are segmented into
episodes with duration of t seconds. Bag-of-features (BoF) are collected for every
episode, therefore each episode has the histogram of spatio-temporal features as
its feature vector. We use discriminative learning with SVM. Note that we also
have others as an activity category in the experiments. This is because our
sequences are not specifically designed for the defined activity types. There are
many observations where the activities are in transition phase or the person
is simply doing some activities at random which are not within our defined
categories. This is also a challenge for our activity recognition algorithm, since
due to the fact that others includes many different motions, the feature space
for others is complex. However, the applications built on top of activity analysis
discussed in this paper are less sensitive to false positives on others , because the
system is usually designed to perform no operation when the user's activity is
not specific. Details of the experiments and performance can be found in [16].
 
Search WWH ::




Custom Search