Database Reference
In-Depth Information
Firstly , the input posture vector x is randomly selected from
T
, and introduced
to the SSOM. For each voxel, the best matching unit (BMU), w
) is selected, i.e.,
(
i
,
j
,
k
) =
(
i
,
j
,
k
arg min
{
E i , j , k }
(11.1)
where E i , j , k is the difference between the current input vector and the weight vectors
for all cluster units,
D
n = 1 ( x n w n , i , j , k )
2
E i , j , k = ˕ (
u i , j , k )
(11.2)
x n is the n -th component of the input vector, and w n , i , j , k is the weight from the
n -th input, n
=
1
,
2
,...,
D
,
to the
(
i
,
j
,
k
)
-th node, i
=
1
,
2
,...,
I
,
j
=
1
,
2
,...,
J
,
and
k
is a count-dependent non-decreasing function, used to
prevent cluster under-utilization.
Secondly , information from x is imparted to the weights of the winning cluster
unit
=
1
,
2
,...,
K .The
˕ (
u i , j , k )
) and all the units residing within the specified neighbourhood NE ( i , j , k )
(
i
,
j
,
k
using,
w ( new )
w ( old )
w ( old )
( i , j , k ) =
( i , j , k ) + ʱ [
x n
( i , j , k ) ]
(11.3)
where
ʱ = μ (
NE
(
) /
NE initial )
(11.4)
i
,
j
,
k
is a predefined learning rate, and NE initial is the initial neighborhood size in terms
of the number of units.
This process of information sharing [i.e., Eq. ( 11.3 )] allows the map nodes to
tune themselves to characteristic postures in the input space, while forcing nearby
nodes to tune to related or adjacent postures.
Thirdly , the same learning steps are repeated. At this point, as new input postures
are presented from the training set, new BMUs compete for their representation,
resulting in a locally organized distribution of key postures over nodes on the map.
Finally , learning is terminated after a maximum number of cycles has been reached.
μ
11.4
Characterization of Dance Gesture Using Spherical
Self-organizing Map
The SSOM was applied to characterize dance gesture in a dance training system
shown in Fig. 11.1 [ 340 ]. The Microsoft Kinect system provides 20 3D skeleton
points to represent each player (student) in the camera's field of view. These points
represent 20 joint positions of the body. In each frame, the normalized locations of
all 20 joint positions were utilized to construct a feature vector, x
t ,
where x i is the i -th location of the joint in one of the x/y/z planes. By considering
all 20 joints in the three dimensions, the dimension of x was 60. Here the location
x i was obtained by the normalization of its original value. This process took the hip
location as the reference point and calculated all other joints relative to the hip.
=[
x 1 ...
x i ...
x 60 ]
Search WWH ::




Custom Search