Gesture recognition in cooking video based on image features and motion features using Bayesian network classifier - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

although we have applied one of the fastest optical flow computing algorithms. It requires

much more running time because dense trajectories in each sequence of images are extracted.

In addition, we only use one PC in our research. If our method is applied on a parallel system

such as the clusters or the high performance computing using GPU-CPU, the result will be im-

proved more.

5 Conclusions

In this chapter, we proposed a method using both image features and motion features for ges-

ture recognition in cooking video. It means the motions in cooking video are represented by

image feature vector and motion feature vectors. In our method, BN model is model to predict

which action class a certain frame belongs to based on the action class of previous frames and

cooking gesture in present frame. Additional information such as the sequence of actions is

also applied into BN model to improve classification result.

According to our results, our proposed method is a good approach for solving action recog-

nition in video. Although its performance is not good enough when comparing with the best

method, we are certain this method can be improved to achieve higher performance. In addi-

tion, it is a completely flexible method as we can add easily more action or other features. Fur-

thermore, we can also reconstruct BNs and update their parameters in nodes easily too. Thus,

our method can be applied for other action recognition systems even there are many complex

actions.

In the future, we are going to improve motion feature extraction, which is acceleration of

feature extraction because now it account to over 80% of the running time. Another problem

that we can improve in near future is using high level features. In our research, at present,

there is still limitation in high-level features application because they still require more com-

putation and time now.

References

[1] ICPR 2012 Contest, Kitchen Scene Context based Gesture Recognition

htp://www.murase.m.is.nagoya-u.ac.jp/KSCGR/index.html .

[2] Shimada A. Kitchen scene context based gesture recognition: a contest in ICPR2012.

In: Advances in depth image analysis and applications. Berlin Heidelberg/New York:

Springer; 2013.

[3] Bosch A, Zisserman A, Munoz X. Representing shape with a spatial pyramid kernel. In:

Proc. of the 6th international conference on image and video retrieval (CIVR), Amster-

dam; 2007.

[4] Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis.

2004;60:91-110.

[5] Wang H, Klaser A, Schmid C, Liu CL. Action recognition by dense trajectories. In: IEEE

conference on computer vision & patern recognition, Colorado Springs, USA, June;

2011:3169-3176.

[6] Snoek C, Worring M, Geusebroek JM, Koelma DC, Seinstra FJ. The MediaMill

TRECVID 2004 semantic video search engine. In: Proc. TRECVID Workshop. Gaithers-

burg, MD: NIST Special Publication; 2004.

[7] Westerveld T, de Vries AP, Ballegooij A, de Jong F, Hiemstra D. A probabilistic mul-

timedia retrieval model and its evaluation. EURASIP JASP. 2003;186-197.

[8] Laptev I. On space-time interest points. Int J Comput Vis. 2005;64:107-123.

Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Search WWH ::

Custom Search

Home