Scalable Indexing of HD Video - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

concerned. Two latest standards designed for HD video and film content have this

property: H.264 and motion JPEG2000 known as (MJPEG2000 [1]. While H.264

has been designed for HD TV and continues the principles of previous standards

in the sense that the transform used (Integer Block Transform) is a variation of a

DCT, which does not have the property of scalability, (M)JPEG2000 standard has

this property naturally, due to the scalable nature of the transform used: the Discrete

Wavelet Transform (DWT).

(M)JPEG2000 is a part of JPEG2000 standard for motion sequences of images.

Nevertheless, contrary to H. 264 it does not contain motion information, each frame

being encoded in an intra-frame mode by JPEG2000. In the following, we give the

insights to JPEG2000 standard [5].

2.1.1

(M)JPEG2000 Standard

Initiated in march 1997 and becoming international ISO standard in December

2000, the standard JPEG2000 exhibited a new efficiency with regard to specifi-

cally high-resolution (HD) images. The specifications of DCI (Digital Cinema Ini-

tiative, LLC [6]) made (M)JPEG2000 the digital cinema compression standard.

(M)JPEG2000 is the extension of the standard JPEG2000 for videos: each frame

in the video sequence is considered separately and encoded with JPEG2000. Fur-

thermore (M)JPEG2000 is becoming the common standard for archiving [7] cultural

cinematographic and video heritage with the greater quality/compression compro-

mise than previously used solutions. The JPEG2000 standard follows the ideas ini-

tially proposed in MPEG4 [8] for object-based coding, namely the possibility to

encode more precisely Regions of Interest (ROI) in each frame or in a single im-

age. The industrial reality in the usage of this advanced feature in JPEG2000 turned

to be pretty much the same as with MPEG4. Despite the very rich research work

proposing various methods for extraction of ROI (e.g. [9, 10]), the most commonly

used JPEG2000 limits to encoding the whole frame. More precisely, an image frame

is modeled as a set of tiles on which the coding process performs independently as

depicted in Figure 1, a frame being considered as a single tile.

The core of the standard is the DWT which in case of lossy compression is real-

ized by High-Pass and Low-Pass filters designed for zero-mean signals. This is why

the Level offset is necessary at the pre-processing step. Furthermore, the standard

operates on YCrCb color system, hence if the source is in RGB, a linear transform

has to be applied. Then the resulting frame undergoes the DWT which we describe

below. The transform coefficients are quantized to reduce the quantity of informa-

tion and entropic coding known as EBCOT (Embedded Block Coding with Opti-

mized Truncation) is performed on these quantized values. At the first step (Tier 1)

context modeling is realized, at the second step (Tier 2) the bit allocation for output

bit stream is performed.

The decoder proceeds in an inverse order to decode the original frame. In the

lossy scheme the original pixel values cannot be recovered, but the quantization

matrix is designed in a way to take into account psycho-visual properties of Human

High-Quality Visual Experience

Search WWH ::

Custom Search

Home