Information Technology Reference
In-Depth Information
cameras is briefly explained in Sec. 4. Sec. 5 presents the MLN knowledge base
used for our problem and the inference flow and considerations. The testbed and
experimental results are described in Sec. 6.
2 Related Work
Image-based object detection approaches are based on local appearance mod-
els, grouping geometric primitives and learning from image patterns [3]. Recent
work based on using image contextual information indicates promising results
on object detection [4,5].
Using human activity as context to detect objects relies upon modeling the
relationship between activities and objects, as well as on vision-based analysis
to infer the activities. In [6] Peursum et al. label image segments with objects
such as floor, chair, keyboard, printer and paper in an oce, based on features
of human pose. Gupta et al. [7] detect manipulable objects (cup, spray bottle,
phone, flashlight) from manipulation motion, reach motion, object reaction and
object evidence from an appearance-based object detector. Both approaches de-
fine a Bayesian model which employs image features and action or pose features
to infer the object type. Such an approach may be sensitive to the environment
and placement of cameras since vision processing is dependent on such factors.
But semantic reasoning of object labels is less dependent on camera views and
more a function of the deduced user activities. Therefore, separating vision pro-
cessing from semantic reasoning allows to transfer the latter module to other
environments. Similar observation is made in [8], where layered hidden Markov
models are used.
Vision-based human activity analysis has seen significant progress in recent
years [9], including advances in analyzing realistic activities from videos of the
public domain [10]. However, there are only a few works that focus on activity
recognition in the home environment. In [11], situation models are extracted
from video sequences of a smart environment, which are essentially semantic-
level activities including both individual activities and two-person interactions.
Both [12] and [13] use video data and RFID for activity recognition. Wu et al.
in [12] use RFID readings and object detection from video to jointly estimate
activity and object use. The learning process is bootstrapped with commonsense
knowledge mined from the internet, and the object model from the video is
automatically acquired without labeling by leveraging RFID readings. Their
work infers activity indirectly from object use. Park et al. compare activity
recognition with RFID and vision [13]. They conclude that for kitchen activities
which involve more object usage and for which visual features (e.g., silhouettes)
are not very distinguishable, RFID-based recognition has higher performance
while vision-based recognition accuracy is higher for living room activities.
3SyemOrvew
Fig. 1(a) shows the two main steps for object recognition in our system. The
first step is activity analysis in the camera network. A detailed illustration of
 
Search WWH ::




Custom Search