Information Technology Reference
In-Depth Information
summarizing and retrieval of HD video content. According to the architecture of
emerging information systems containing HD video, various levels of granularity
can be accessed, indexed and retrieved. Wavelets, the hierarchical transform widely
used for actual and future scalable HD standards is an excellent basis for this.
3
HD Content Indexing Using Patch Descriptors in the Wavelet
Domain
In this part of the chapter, we present a method for comparing HD video seg-
ments statistically using a sparse multiscale description of the content. This de-
scription is both spatial and temporal and relies on the following concepts: 1) a
sparse and multiscale transform of the video content; 2) a local patch description
obtained by grouping spatially or temporally coherent pieces of information; 3)
the multiple occurrences of similar patches throughout the video lead to a global
statistical description of the video content that is robust to the usual geometric
or radiometric video transformations. The comparison of these descriptors is nat-
urally done in a statistical fashion. The global dissimilarity proposed is a weighted
combination of Kullback-Leibler divergences between the probability densities of
the different kinds of patches. The estimation of the dissimilarity is done non-
parametrically in a k-th nearest neighbor context, which enables us to cope with
the high-dimensionality of the probability density functions at stake. This method is
designed to compare short video segments (e.g. Groups of Pictures (GOPs) of eight
frames), with the understanding that to compare larger videos we sum up the dissim-
ilarities between their consecutive GOPs. In the sequel, we describe how to extract
the video description from GOPs and how to estimate the proposed dissimilarity.
Finally, we report results obtained on content-based queries experiments.
3.1
Sparse Multiscale Patches and Motion Patches Descriptors
Our description of a GOP extracts separately a spatial information relative to the
scene and a temporal information relative to the motion within the GOP. The spa-
tial information is extracted from the first frame of the GOP while the temporal
information is extracted from the motion of blocks throughout the GOP.
3.1.1
Spatial Descriptors: Sparse Multiscale Patches (SMP)
A structure in an image I can be identified by the coherence (or correlation) of the
multiscale coefficients of I around a particular location p and a particular spatial
scale k . A patch of the sparse multiscale patches ( SMP ) description of an image
I [30, 31] is a group of multiscale coefficients that vary simultaneously in presence
of a spatial structure: these are coefficients of all color channels that are neighbors
across scale and location.
More precisely, we write w I c
k , p for the multiscale coefficient of channel c of image
I at scale k and location p (this would be the dot product of channel c of image
I with a waveform of scale k centered at location p ). With the detail coefficients
Search WWH ::




Custom Search