Image Processing Reference
We would like to thank Atsushi Shimada, Kazuaki Kondo, Daisuke Deguchi, Géraldine Morin,
and Helman Stern for organizing KSCGR contest and Tomo Asakusa et al. from Kyoto
University, Japan for creating and distributing Actions for Cooking Egg dataset.
As you know, cooking and eating are our routines which all of us must do in order to stay
healthy. Although these are simple tasks that anyone would have to go through every day in
life, they account for a very important position because of healthy impact. On the other hand,
in a modern society, time has become more precious than ever before. Everyone does not have
much time for cooking themselves and it leads to a direct impact on the health of everyone.
Therefore, the question “How could we have a delicious and nutritious dish with less cooking
time?” has been raised for a while.
In recent years, researchers all over the world have been building various intelligent kitchen
systems, which are anticipated as the answer for the above question. They expect that these
systems can help everyone cook faster and more efficiently. In these systems, there is not only
a single solution but also the solutions of many different problems such as object recognition
problem, human action recognition, or nutritious meals computation which are combined to-
gether. All of the above problems have been actually raised in the “Multimedia for Cooking
and Eating Activities” workshop from 2009. Until now, many complex challenges still exist
and there is not any complete solution. Among these problems, we evaluated that the human's
cooking action recognition is the most challenge problem.
One of its challenges is action recognition problem. Its objection is how a computer program
can recognize cooking actions based on training dataset. Furthermore, based on sequences
of cooking actions, it could predict what kind of dishes. In reality, we expect that when this
program is being executed, it observes actions of user(s), recognizes these actions, and either
warns user(s) if there is any wrong or suggests next cooking steps. Therefore, we realize that
solving problem of cooking action recognition is the most important task to complete our in-
ference as an interesting topic in video retrieval field. Through this contest and many different
researches, there are numerous solutions from many researchers. Moreover, plenty of dataset
are created and distributed to researchers. One of them is new “Actions for Cooking Eggs”
contest [ 1 ].
In this chapter, a novel method for recognizing human's actions in cooking video is pro-
posed. Our proposed method derives from combination between image features and motion
features for gesture recognition. Because of complexity of this problem, we divide it into four
subproblems which include cooking action representation by image features, cooking action
representation by motion features, combination of image features and motion features, and
cooking action classification. From a cooking video, first, the cooking actions are represented
by some image features such as pyramid histogram of oriented gradient (PHOG) [ 3 ], or scale