Spatio-temporal Dynamic Texture Descriptors for Human Motion Recognition - Intelligent Video Event Analysis and Understanding

Information Technology Reference

In-Depth Information

directions and combined into the gradient cuboid permits to have a better

performance for LBP-TOP in the description of actions. The gabor filtering

applied to the original cuboid helps in increasing the final accuracy for LBP-

TOP, although the performance is lower than using the gradient image. This

could be explained as the gradient images encode more relevant information

for describing the motion inside each video patch than the Gabor images.

Moreover, the gradient image better defines the borders of the movement,

while gabor image better highlights the area of motion, as shown in Figure

15. A further improvement in the performances can be achieved by applying

the Extended LBP-TOP on the gradient or gabor cuboids. The Extended

LBP-TOP applied on gabor cuboids is giving very close performance with

the Extended LBP-TOP applied on gradient cuboids.

In Figure 13, a comparison between LBP-TOP and CSLBP-TOP is shown,

keeping fixed the number of visual words k=1000. As we can notice, the

performance of CSLBP-TOP operator is close to that of LBP-TOP, as well

as Extended CSLBP-TOP is very similar with Extended LBP-TOP. A higher

number of neighbors is needed for CSLBP-TOP to reach better classification

accuracy; as the plots show, a number of neighbors equal to 10 or 12 permits

to reach performance similar, and even slightly better, to the original LBP-

TOP. However, given a fixed number of neighbors, CSLBP-TOP's descriptor

is 16 times shorter than that of LBP-TOP. For a number of neighbors equal to

8, the descriptor is 48 length, while LBP-TOP's descriptor is 768 dimensions

length.

Figure 14 highlights the best results for LBP-TOP and CSLBP-TOP. The

best results have been achieved with the original LBP-TOP implementation.

CSLBP operator applied to Gradient or Gabor images gives worse accuracy

results.

In general, CSLBP-TOP performs similar with the original LBP-TOP in

the field of human action recognition. If the number of neighbors are increased

(i.e. P =12

), Extended CSLBP-TOP is slightly outperforming the Extended

LBP-TOP ( P =8

), as more spatial information is taken into account during

the computation of CS-LBP operator. The Extended Gradient CSLBP-TOP

version is performing best among the descriptors based on the CS-LBP oper-

ator, reaching almost the performance of Extended Gradient LBP-TOP using

1-NN classifier.

Best performances have been achieved by using the Extended Gradient

LBP-TOP. The classification accuracy has been of 92.69% and 92.57% if

PCA is applied using the 1-NN classifier with

χ 2

distance and setting the

codebook's size equal to 1250.

Table 4 shows the computational time for describing one small video patch

and classification accuracy. As can be seen, the time is increasing if a higher

number of slices is taken into account and if the gradient or gabor cuboid is

computed. CSLBP-TOP implementation is slightly faster, as less comparisons

have to be computed and the final histogram is shorter. The Extended Gabor

LBP-TOP requires more time among all LBP-TOP methods.

Intelligent Video Event Analysis and Understanding

Search WWH ::

Custom Search

Home