Database Reference
In-Depth Information
(
|
)=
(
,
h k )
P t
h s
k
HI
h s
(11.15)
1
/
K
if t
=
t 0
P t (
k
)=
(11.16)
P t 1 (
k
|
h s )
HI
(
h s ,
h k )
otherwise
K P t 1 (
k
|
h s )
HI
(
h s ,
h k )
HI
(
h s
,
h k )=
1
i min
[
h s , i
,
h k , i ]
(11.17)
According to the equations above, the input sequence is allowed to accumulate
postures over time t , where for each instant, the accumulated gesture is projected
onto the SSOM to generate a posture sequence, which can be converted into one
of the four template representations from Sect. 11.5 . Likelihoods are estimated
as histogram intersections, Eq. ( 11.17 ), between each reference template and that
computed from the input posture sequence. A perfect intersection with a template
will yield a likelihood of 1 for a given class. It is important to note that all templates
are normalized, (even if calculated from a gesture sequence containing only a single
posture).
As the sequence begins to resemble a gesture from the known set, it's posterior
will grow, and eventually surpass a detection threshold. Upon triggering this
threshold, the class k with the maximum posterior is considered detected, and the
system resets the priors for all classes, and recalculates the posterior. At this point,
in order to free up postures from the accumulated sequence, t 0 is set to the current
time, thus the newly considered sequence grows again from this instant (flushing all
past postures). This process continues, triggering new instances of detected gestures,
until the end of the input sequence is reached.
In order to assess the online capability of the system to recognize and isolate ges-
tures from a continuous dance sequence, two dataset were constructed (Table 11.6 ).
In this dataset, all component gestures have a representation in the trained posture
space. The continuous dance sequences compose of gesture G 1 - G 6 as discussed
previously in Table 11.1 . The online recognition is applied for both the Teacher and
Student, using the PO and PT descriptors respectively. The Posterior probability is
captured as a trace (for each gesture class) over the duration of the dance sequence.
Results for the Teacher sequence are shown in Fig. 11.6 , while results for the Student
sequence are shown in Fig. 11.7 .
The results for the Teacher show that, for both descriptors, the posterior appears
to be quite robust in estimating and switching between gestures. The maximum
posterior is selected as the prediction of the gesture class at each time sample in
the sequence (shown in Fig. 11.6 , bottom left and right). The prediction has been
able to extract and segment in an online manner, the duration of each gesture in
the sequence: G 6 , G 1 , G 2 , G 3 , G 4 , G 5 with some minor noise at the beginning and
end of each dance. According to this result, the system can accurately recognise the
dance gestures from the continuous sequence with 100 % accuracy. It is apparent
that there should be a class to capture derelict cases of postures other than the learned
set, otherwise the posterior will attempt to lock onto the best representation for the
input (e.g. G 5 at the beginning of the sequence).
Search WWH ::




Custom Search