Motion Database Retrieval with Application to Gesture Recognition in a Virtual Reality Dance Training System - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

(

h k )

P t

h s

(11.15)

if t

t 0

P t (

(11.16)

P t − 1 (

h s )

(

h s ,

h k )

otherwise

∑ K P t − 1 (

h s )

(

h s ,

h k )

− ∑

(

h s

h k )=

i min

[

h s , i

h k , i ]

(11.17)

According to the equations above, the input sequence is allowed to accumulate

postures over time t , where for each instant, the accumulated gesture is projected

onto the SSOM to generate a posture sequence, which can be converted into one

of the four template representations from Sect. 11.5 . Likelihoods are estimated

as histogram intersections, Eq. ( 11.17 ), between each reference template and that

computed from the input posture sequence. A perfect intersection with a template

will yield a likelihood of 1 for a given class. It is important to note that all templates

are normalized, (even if calculated from a gesture sequence containing only a single

posture).

As the sequence begins to resemble a gesture from the known set, it's posterior

will grow, and eventually surpass a detection threshold. Upon triggering this

threshold, the class k with the maximum posterior is considered detected, and the

system resets the priors for all classes, and recalculates the posterior. At this point,

in order to free up postures from the accumulated sequence, t 0 is set to the current

time, thus the newly considered sequence grows again from this instant (flushing all

past postures). This process continues, triggering new instances of detected gestures,

until the end of the input sequence is reached.

In order to assess the online capability of the system to recognize and isolate ges-

tures from a continuous dance sequence, two dataset were constructed (Table 11.6 ).

In this dataset, all component gestures have a representation in the trained posture

space. The continuous dance sequences compose of gesture G 1 - G 6 as discussed

previously in Table 11.1 . The online recognition is applied for both the Teacher and

Student, using the PO and PT descriptors respectively. The Posterior probability is

captured as a trace (for each gesture class) over the duration of the dance sequence.

Results for the Teacher sequence are shown in Fig. 11.6 , while results for the Student

sequence are shown in Fig. 11.7 .

The results for the Teacher show that, for both descriptors, the posterior appears

to be quite robust in estimating and switching between gestures. The maximum

posterior is selected as the prediction of the gesture class at each time sample in

the sequence (shown in Fig. 11.6 , bottom left and right). The prediction has been

able to extract and segment in an online manner, the duration of each gesture in

the sequence: G 6 , G 1 , G 2 , G 3 , G 4 , G 5 with some minor noise at the beginning and

end of each dance. According to this result, the system can accurately recognise the

dance gestures from the continuous sequence with 100 % accuracy. It is apparent

that there should be a class to capture derelict cases of postures other than the learned

set, otherwise the posterior will attempt to lock onto the best representation for the

input (e.g. G 5 at the beginning of the sequence).

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home