Image Processing Reference
In-Depth Information
although we have applied one of the fastest optical flow computing algorithms. It requires
much more running time because dense trajectories in each sequence of images are extracted.
In addition, we only use one PC in our research. If our method is applied on a parallel system
such as the clusters or the high performance computing using GPU-CPU, the result will be im-
proved more.
5 Conclusions
In this chapter, we proposed a method using both image features and motion features for ges-
ture recognition in cooking video. It means the motions in cooking video are represented by
image feature vector and motion feature vectors. In our method, BN model is model to predict
which action class a certain frame belongs to based on the action class of previous frames and
cooking gesture in present frame. Additional information such as the sequence of actions is
also applied into BN model to improve classification result.
According to our results, our proposed method is a good approach for solving action recog-
nition in video. Although its performance is not good enough when comparing with the best
method, we are certain this method can be improved to achieve higher performance. In addi-
tion, it is a completely flexible method as we can add easily more action or other features. Fur-
thermore, we can also reconstruct BNs and update their parameters in nodes easily too. Thus,
our method can be applied for other action recognition systems even there are many complex
actions.
In the future, we are going to improve motion feature extraction, which is acceleration of
feature extraction because now it account to over 80% of the running time. Another problem
that we can improve in near future is using high level features. In our research, at present,
there is still limitation in high-level features application because they still require more com-
putation and time now.
References
[1] ICPR 2012 Contest, Kitchen Scene Context based Gesture Recognition
htp://www.murase.m.is.nagoya-u.ac.jp/KSCGR/index.html .
[2] Shimada A. Kitchen scene context based gesture recognition: a contest in ICPR2012.
In: Advances in depth image analysis and applications. Berlin Heidelberg/New York:
Springer; 2013.
[3] Bosch A, Zisserman A, Munoz X. Representing shape with a spatial pyramid kernel. In:
Proc. of the 6th international conference on image and video retrieval (CIVR), Amster-
dam; 2007.
[4] Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis.
2004;60:91-110.
[5] Wang H, Klaser A, Schmid C, Liu CL. Action recognition by dense trajectories. In: IEEE
conference on computer vision & patern recognition, Colorado Springs, USA, June;
2011:3169-3176.
[6] Snoek C, Worring M, Geusebroek JM, Koelma DC, Seinstra FJ. The MediaMill
TRECVID 2004 semantic video search engine. In: Proc. TRECVID Workshop. Gaithers-
burg, MD: NIST Special Publication; 2004.
[7] Westerveld T, de Vries AP, Ballegooij A, de Jong F, Hiemstra D. A probabilistic mul-
timedia retrieval model and its evaluation. EURASIP JASP. 2003;186-197.
[8] Laptev I. On space-time interest points. Int J Comput Vis. 2005;64:107-123.
 
 
 
 
 
 
 
 
Search WWH ::




Custom Search