Motion Database Retrieval with Application to Gesture Recognition in a Virtual Reality Dance Training System - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

Firstly , the input posture vector x is randomly selected from

, and introduced

to the SSOM. For each voxel, the best matching unit (BMU), w

) ∗ is selected, i.e.,

(

) ∗ =

(

arg min

{

E i , j , k }

(11.1)

where E i , j , k is the difference between the current input vector and the weight vectors

for all cluster units,

n = 1 ( x n − w n , i , j , k )

E i , j , k = ˕ (

u i , j , k )

(11.2)

x n is the n -th component of the input vector, and w n , i , j , k is the weight from the

n -th input, n

,...,

to the

(

)

-th node, i

,...,

and

is a count-dependent non-decreasing function, used to

prevent cluster under-utilization.

Secondly , information from x is imparted to the weights of the winning cluster

unit

,...,

K .The

˕ (

u i , j , k )

) ∗ and all the units residing within the specified neighbourhood NE ( i , j , k ) ∗

(

using,

w ( new )

w ( old )

( i , j , k ) ∗ =

( i , j , k ) ∗ + ʱ [

x n −

( i , j , k ) ∗ ]

(11.3)

where

ʱ = μ (

(

) ∗ /

NE initial )

(11.4)

is a predefined learning rate, and NE initial is the initial neighborhood size in terms

of the number of units.

This process of information sharing [i.e., Eq. ( 11.3 )] allows the map nodes to

tune themselves to characteristic postures in the input space, while forcing nearby

nodes to tune to related or adjacent postures.

Thirdly , the same learning steps are repeated. At this point, as new input postures

are presented from the training set, new BMUs compete for their representation,

resulting in a locally organized distribution of key postures over nodes on the map.

Finally , learning is terminated after a maximum number of cycles has been reached.

11.4

Characterization of Dance Gesture Using Spherical

Self-organizing Map

The SSOM was applied to characterize dance gesture in a dance training system

shown in Fig. 11.1 [ 340 ]. The Microsoft Kinect system provides 20 3D skeleton

points to represent each player (student) in the camera's field of view. These points

represent 20 joint positions of the body. In each frame, the normalized locations of

all 20 joint positions were utilized to construct a feature vector, x

t ,

where x i is the i -th location of the joint in one of the x/y/z planes. By considering

all 20 joints in the three dimensions, the dimension of x was 60. Here the location

x i was obtained by the normalization of its original value. This process took the hip

location as the reference point and calculated all other joints relative to the hip.

x 1 ...

x i ...

x 60 ]

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home