Information Technology Reference
In-Depth Information
is suitable for input analysis and to recognize hand motion. As a perceptive
processing tool, this method is aimed to transform percepts in a sub-symbolic
representation, which encodes an emotional status. In turn, this representation
provides the cognitive processing module with a suitable input.
At first, the normal flow field is computed from the input sequence. The
expectation maximization (EM) algorithm is used to fit a Gaussian to the
normal flow histogram computed across the frame. The moving arm is
detected as the dominant region in the normal flow field, that is, the set of all
the points whose normal flow value is ≥4 σ . Sample points are selected as the
ones with a large gradient value as well as a large normal flow value and
whose gradient is similar to their neighbours. The boundary of the arm is
obtained using the Dijkstra shortest path connecting all the sample points.
Then, affine transform parameters describing the arm global motion are
estimated from the analysis of the arm boundaries.
Symbolic information is obtain via a hierarchical clustering. LVQ is used
to compress affine parameter vectors and to derive a labelled Voronoi
tessellation where each tile corresponds to a motion primitive without a
precise meaning. The next layer clusters label sequences in sub-activities like
up , down and circle . Finally, a robust matching procedure based on nearest
neighbour classification, groups sub-activity sets into complex sequences like
striking , pounding , swirling (= repeated circle ) and so on.
Figures 2-4 show some steps of the approach explained earlier. Figure
2(a) shows an image taken from a 400-frames long pounding sequence, while
Figure 2(b) shows an image from a 100-frames long swirling sequence.
Images were captured by a progressive colour scan SONY DFW-VL500
camera with a frame rate of 30 frames per second, each frame being 320 ×
240 pixel wide.
Figure 3 (first row) shows the points with the maximum normal flow,
while the arm boundary is depicted in the second row.
Figure 4 (first row) shows the residual flow that is computed as the
difference between the normal motion field given by the affine parameters
and the normal flow. Figure 4 (second row) shows re-estimated affine flows
after outlier removal using again EM to fit a Gaussian distribution residual
flow.
3.1.2 Facial data elaboration
All the animals, and humans above all, use face as the main channel for non-
verbal communication. Expression is composed of several features like eye
movements, mouth and eyebrow position, configuration of facial muscles and
so on. All these signals are part of a perceptive model that is used as the basis
for understanding emotions in the user [11].
Search WWH ::




Custom Search