Gesture recognition in cooking video based on image features and motion features using Bayesian network classifier - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

For matching the feature vectors, there are some matching methods. However, for this prob-

lem, there is a large of feature vectors, which cannot be applied the original matching meth-

ods. Therefore, we use bag of feature (BoF) [ 16 ] to describe image features and motion features

for speeding up the matching step. For each frame, we detect objects and also compute their

relations. Moreover, using the training labels, we can learn some rules about the sequences of

actions. Finally, we apply BN construction and parameters learning.

For the last step, we create three BNs first, and then train their parameters. The output of

training system is a model which describes the cooking actions. In the testing system, we use

the trained model to classify cooking actions. Besides, action labels of the previous actions are

used as an additional input data for BN classifier. The action label of each frame is the output

of testing system. Lastly, we evaluate our method based on the output by calculating accuracy

score from precision and recall manner. 1 Their harmonic, after that, is calculated by following

formula

(1)

The final score is given by averaging all F -measures of all cooking motion label.

3.2 Preprocessing Input Data

In preprocessing step, we have prepared data for available to use in the following steps. First

of all, we calibrate depth images because there is a distance separates depth camera and color

camera. The Kinect disparity is related to a normalized disparity

(2)

where d is a normalized disparity, k d is the Kinect disparity, and d of is an offset value particular

to a given Kinect device. According to a technical Kinect calibration report [ 17 ] , d of is typically

around 1090. Moreover, in every depth image, there is a small black band on the right of the

image which we expect to eliminate for mapping the depth image to color image later.

For fast segmentation, we use depth images to segment table area and floor area. We choose

some first frames then find a border that separates table and floor areas. Because there is no

human in these frames, it is easy for segmentation. Border detection is based on the sum of

grayscale of the depth image for each row and also the ratio of disparity of two adjacent rows.

Search WWH ::

Custom Search

Home