Spatio-temporal Dynamic Texture Descriptors for Human Motion Recognition - Intelligent Video Event Analysis and Understanding - page 89

Information Technology Reference

In-Depth Information

by the choice of the threshold and we have chosen a suitable threshold to

have 80 detected STIPs for this comparison. There is also to mention that

Laptev's executable code is compiledinCenvironment,whileourLBP-TOP

implementation is compiled in Matlab environment. Similar performance to

Laptev's is achieved using the Extended LBP-TOP descriptor which is al-

most 3 times computationally faster than the Extended Gradient LBP-TOP

descriptor.

Tabl e 5 Accuracy and computational time for different LBP-TOP methods and

HOG-HOF, k=1000 visual words

Descriptor

length

Computational

time (s)

Accuracy

(SVM)

Method

LBP-TOP 8 , 8 , 8 , 2 , 2 , 2

768

0.0139

86.25 %

Ext Grad LBP-TOP 8 , 8 , 8 , 2 , 2 , 2

2304

0.0992

90.72 %

Ext Grad LBP-TOP 8 , 8 , 8 , 2 , 2 , 2 +

PCA

100

0.1004

91.25 %

HOG-HOF

162

0.2820*

89.88 %

HOG-HOF + PCA

100

0.2894*

89.28 %

5Conclu ion

In this chapter, we have applied LBP-TOP as a descriptor of small video-

patches used in a part-based approach for human action recognition. We

have shown that LBP-TOP descriptor can be suitable for the description of

cuboids extracted from a video sequences and containing information about

human movements and actions.

We have modified the original descriptor introducing the CSLBP-TOP de-

scriptor and we applied the LBP and CS-LBP operator to the original, gradi-

ent and Gabor images. Moreover, we extended LBP-TOP and CSLBP-TOP

considering the action at three different frames in XY plane and at different

views in XT and YT planes.We have also shown that the performance of

descriptor is quite stable when the PCA is applied.

The use of Extended Gradient LBP-TOP permits us to reach the best

results on the KTH human action database by achieving 92.69% classification

accuracy and 92.57% if PCA is applied using 1-NN classifier with χ 2 distance

and setting the codebook's size equal to 1250. If SVM classifier is chosen, the

classification accuracy is slightly lower, 91.46 % and 91.34% if PCA is applied.

In Figure 16 the confusion matrices are shown for both classifiers. Most of the

confusion happens between the classes running and jogging, as these actions

are very similar to each other, while all actions performed by hands and arms

are quite accurately classified.

Next Page

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home