Image Processing Reference
Comparison of the Results of Video Segment Classification for Weizmann Dataset
Recognition Rates (%)
Chaudry et al. [ 14 ] 94.44
Ali et al. [ 15 ] 92.60
This article 90.32
Niebles et al. [ 20 ] 90.00
Niebles and Fei-Fei [ 19 ] 72.80
The methods shown in Table 2 are some of the well-known reference studies tackling with
features having time dimension. The classification is done according to these novel 3D fea-
classification whereas Gianluigi and Raimondo [ 14 ] focuses on representing the segments
frame by frame optical flows with high dimensions causing curse of dimensionality problem.
Instead of dealing with the representation, the method aims to contribute by finding new met-
other hand, propose a representation structure based on direction histograms of optical low.
ments as bag-of-features and makes the classification according to the code words. A bag-of-
When we analyze the results, we have seen that the methods proposing interest point-based
new 3D features are more successful than the other models. But, the features are specific to
the dataset which makes the solution dependent on dataset types. Methods focusing on min-
ing the highly over-descriptive data in terms of time domain exhibit high success rates as they
develop the model independent from the contributions in video features. But, they are disad-
vantageous with their high-dimensional representation regarding time complexity. Our op-
tical low-based method beter results than the approaches using optical low-based segment
representation. It is also more successful than the BoW-based methods.
In this study, we tried to solve a combination of different problems on action recognition.
The fundamental problem inspires us is the representation of temporal information. In many
ields, representation of temporal information is essential to retrieve information from a time
poral dataset. The solution to the problem varies from representing each temporal entity in a
different time slice to representing a simple summary of the whole time interval. Efforts for
inding a solution between these two endpoints should try to tackle the problem from difer-
ent point of views. This is because, the level of representation changes with the source of the
problem. For instance, to represent all the information in all time slices for symbolizing the
temporal information having high-frequency over time, one should handle the curse of di-