Gesture recognition in cooking video based on image features and motion features using Bayesian network classifier - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

Motion features

Depth image

Acknowledgments

We would like to thank Atsushi Shimada, Kazuaki Kondo, Daisuke Deguchi, Géraldine Morin,

and Helman Stern for organizing KSCGR contest and Tomo Asakusa et al. from Kyoto

University, Japan for creating and distributing Actions for Cooking Egg dataset.

1 Introduction

As you know, cooking and eating are our routines which all of us must do in order to stay

healthy. Although these are simple tasks that anyone would have to go through every day in

life, they account for a very important position because of healthy impact. On the other hand,

in a modern society, time has become more precious than ever before. Everyone does not have

much time for cooking themselves and it leads to a direct impact on the health of everyone.

Therefore, the question “How could we have a delicious and nutritious dish with less cooking

time?” has been raised for a while.

In recent years, researchers all over the world have been building various intelligent kitchen

systems, which are anticipated as the answer for the above question. They expect that these

systems can help everyone cook faster and more efficiently. In these systems, there is not only

a single solution but also the solutions of many different problems such as object recognition

problem, human action recognition, or nutritious meals computation which are combined to-

gether. All of the above problems have been actually raised in the “Multimedia for Cooking

and Eating Activities” workshop from 2009. Until now, many complex challenges still exist

and there is not any complete solution. Among these problems, we evaluated that the human's

cooking action recognition is the most challenge problem.

One of its challenges is action recognition problem. Its objection is how a computer program

can recognize cooking actions based on training dataset. Furthermore, based on sequences

of cooking actions, it could predict what kind of dishes. In reality, we expect that when this

program is being executed, it observes actions of user(s), recognizes these actions, and either

warns user(s) if there is any wrong or suggests next cooking steps. Therefore, we realize that

solving problem of cooking action recognition is the most important task to complete our in-

telligent kitchen system. This problem has been mentioned in a contest [ 1 ] of ICPR2012 con-

ference as an interesting topic in video retrieval field. Through this contest and many different

researches, there are numerous solutions from many researchers. Moreover, plenty of dataset

are created and distributed to researchers. One of them is new “Actions for Cooking Eggs”

(ACE) Dataset [ 2 ] , which we used for evaluating our method in experiments, was presented in

contest [ 1 ].

In this chapter, a novel method for recognizing human's actions in cooking video is pro-

posed. Our proposed method derives from combination between image features and motion

features for gesture recognition. Because of complexity of this problem, we divide it into four

subproblems which include cooking action representation by image features, cooking action

representation by motion features, combination of image features and motion features, and

cooking action classification. From a cooking video, first, the cooking actions are represented

by some image features such as pyramid histogram of oriented gradient (PHOG) [ 3 ], or scale

Search WWH ::

Custom Search

Home