Image Processing Reference
In-Depth Information
Table 1
Action Recognition Precision (%)
Action
Image Features Motion Features Our Method
Breaking
0.0
12.7
15.0
Mixing
0.0
22.0
22.5
Baking
36.9
38.4
41.9
Turning
0.0
54.5
54.6
Cuting
26.1
17.3
17.4
Boiling
0.0
29.2
29.1
Seasoning 0.0
15.9
16.0
Peeling
0.0
27.5
27.5
Average
7.9
27.2
28.0
Therefore, we find out that the motion features are more efficient to represent the cooking
actions than the image features, especially for the actions with large amplitude such as baking
or cuting using a big cooking tool as a knife. It is successful to recognize based on the image
features. However, they do not describe the other actions which have small amplitude or use
tiny cooking tools. On the other hand, when we use only image features, the actions with no
color changing or no use any cooking tools such as boiling, breaking, or seasoning could not
be recognized by image features including color feature or edge feature. After that, we com-
bine both of image features and motion features to obtain beter results as we have expected.
The result of a combination is shown in the last column of Table 1 . Compared with two
irst cases, the combination of the two features improves the recognition results. In this case,
some actions are also well recognized, especially the actions “baking” and “turning,” because
they are the ones with large amplitude. However, some action recognitions are not improved
while comparing with motion features because image features may not describe exactly these
actions. To solve the problems, one way is to modify the high-level image feature extraction
to describe more exactly such as determining exactly which cooking tools are being used. The
other way is to combine these features in more efficient way such as trying to find better para-
meters w 1 and w 2 .
Moreover, the recognition results of each testing video are shown in Table 2 . Each video
contains a sequence of actions to cook a certain dish. In our experiment, the best precision is
over 40% for all frames in a video. Although the average of precision is approximate 30%, it
proves that our approach is appropriate. If we optimize the parameters and improve our im-
plementation, the result would be definitely improved. Therefore, our works still need to be
continued in near future.
 
Search WWH ::




Custom Search