Information Technology Reference
In-Depth Information
Sparse
Dictionaries
Audio Signals
Video segment
(of 0.6 second length)
40 ms
Compute dense
HoG & HoF
MFCC features
(13-dimensional)
SC-based audio
representation
(1024-dimensional)
SC-based HoG & HoF
representations
(2048-dimensional)
HoG & HoF features
(144-dimensional)
Sparse coding of
audio & visual content
Fig. 11.1 The generation process of SC-based audio and visual representations for video segments.
Each video segment is of length 0.6 s. Separate dictionaries are constructed and used for MFCC,
HoG and HoF to generate 1,024-dimensional representations. Each HoG and HoF descriptor is
144-dimensional
11.3.1.2 Low-Level Visual Representation
Film-makers usually make use of motion in order to elicit some particular perception
in the audience [ 33 ]. Therefore, we use motion-related descriptors for the visual
representation of video segments. One of the motion descriptors is ViF which is an
efficient motion descriptor. We computed a ViF descriptor for each video segment
to represent statistics of flow-vector magnitude changes over time. For a detailed
explanation of the computation of this descriptor, the reader is referred to [ 18 ].
In addition to motion information, static content of video frames is also important
for evoking some particular perception in the audience [ 33 ]. We, therefore, also
use static content representations in our work. More specifically, we employ affect-
related static visual descriptors. Inspired by the work presented in [ 25 ], we compute
mean and standard deviation of saturation, brightness, and hue in the HSL color
space. We also compute the colorfulness of the keyframe of video segments using
the method in [ 17 ], where the keyframe is deemed to be the frame in the middle of
a video segment.
11.3.1.3 Mid-level Visual Representation
Mid-level visual representations are based on HoG and HoF features extracted from
the visual content of video segments of 0.6 s length. HoG and HoF descriptors are
densely sampled and computed for subvolumes of video segments (HoG descriptors
 
Search WWH ::




Custom Search