Image Processing Reference
For matching the feature vectors, there are some matching methods. However, for this prob-
lem, there is a large of feature vectors, which cannot be applied the original matching meth-
for speeding up the matching step. For each frame, we detect objects and also compute their
relations. Moreover, using the training labels, we can learn some rules about the sequences of
actions. Finally, we apply BN construction and parameters learning.
For the last step, we create three BNs first, and then train their parameters. The output of
training system is a model which describes the cooking actions. In the testing system, we use
the trained model to classify cooking actions. Besides, action labels of the previous actions are
used as an additional input data for BN classifier. The action label of each frame is the output
of testing system. Lastly, we evaluate our method based on the output by calculating accuracy
score from precision and recall manner. 1 Their harmonic, after that, is calculated by following
The final score is given by averaging all F -measures of all cooking motion label.
3.2 Preprocessing Input Data
In preprocessing step, we have prepared data for available to use in the following steps. First
of all, we calibrate depth images because there is a distance separates depth camera and color
camera. The Kinect disparity is related to a normalized disparity
where d is a normalized disparity, k d is the Kinect disparity, and d of is an offset value particular
to a given Kinect device. According to a technical Kinect calibration report [ 17 ] , d of is typically
around 1090. Moreover, in every depth image, there is a small black band on the right of the
image which we expect to eliminate for mapping the depth image to color image later.
For fast segmentation, we use depth images to segment table area and floor area. We choose
some first frames then find a border that separates table and floor areas. Because there is no
human in these frames, it is easy for segmentation. Border detection is based on the sum of
grayscale of the depth image for each row and also the ratio of disparity of two adjacent rows.