Motion Database Retrieval with Application to Gesture Recognition in a Virtual Reality Dance Training System - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

a distance score. The distance scores for all the reference templates are set to a

decision rule, which provides a classification of the input gesture, and possibly an

ordered (by distance) set of the best n candidates.

11.3

Spherical Self-organizing Map (SSOM)

Prior to recognition, the system discussed in Fig. 11.3 creates the gesture reference

templates using a training algorithm. This is to first automatically parse samples

from across the spectrum of expected dance movements, into a discrete set of

postures. This is achieved using SSOM, an unsupervised clustering algorithm that

reduces a large number of input data vectors to a small set of prototypical units. The

SSOM enables learned postures to be allocated to, and distributed across, nodes on

a predefined lattice [ 344 , 348 ]. This results from the wrap-around, neighbourhood

learning that occurs when the lattice forms a closed loop sphere. A useful feature

of a SSOM-based approach is that the discrete space is constructed in such a way

as to retain associations that exist in the original input space, i.e. postures (learned)

are positioned in the map nearby to other postures that are very similar in nature.

As a consequence of this topology-preserving mapping, a sequence of postures

(comprised in the movement or gesture) should trace a rather smooth trajectory on

the map. It is from this trajectory (sequence of key postures) that the descriptors are

acquired for representing each gesture.

The map's spherical lattice is constructed by progressively sub-dividing a regular

icosahedron down to a desired level ( l ). This results in a series of nodes uniformly

arranged on a tessellated unit sphere (with uniform triangular elements). A sphere

tessellated one level

would

each result in lattices of 42 and 162 nodes respectively. Each node on the sphere is

then represented by a weight vector: w i , j , k ∈ R

(

)

would result in 12 nodes, while

(

)

and

(

)

D , which models a key posture from

the input space, where w i , j , k is the weight vector of

th node. The total number

of nodes represents the number of postures that can be learned by the map. In this

representation, nodes are each equidistant from their immediate neighbours, with

which they form a hexagonal neighbourhood.

Figure 11.4 shows a cluster unit of the SSOM. Each training pattern in the input

space is connected to every cluster unit by a weight vector w i , j , k . Every cluster unit

(

)

has a variable neighborhood ( NE i , j , k ) with a decreasing radius. All the

nodes that fall within the area defined by NE i , j , k constitute the region-of-influence

(

)

(

)

D . Each vector x is referred to as

a posture vector in a dance gesture. The learning process of the SSOM starts by ini-

tializing the weight vectors w i , j , k with small random values distributed throughout

the input space. Various steps are employed by the SSOM to topologically reorder

the cluster weights on the spherical lattice, as follows [ 344 , 348 ]:

Let

T = {

x i }

1 be the training set, where x

∈ R

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home