The Development and Standardization of Ultra High Definition Video Technology - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

3.2.2.1 DCT

DCT converts the spatial domain signal into the frequency domain using a win-

dow with fixed width for the transformation. Usually, a picture is divided into

NxN pixel blocks (N pixels width both horizontal and vertical directions) and the

transform is performed for each pixel block. The DCT is expressed as follows,

N-1

x=0

N-1

x=0

N-1

y=0

N-1

y=0

(2x +1)u

(2y +1)v

cos

F (u, v )=

C (u)C (v)

f (x, y)

where

√

( u, v = 0 )

C(u), C(v)=

( u, v

0 )

≠

On the other hand, the inverse transform (IDCT) reconverts a transformed signal

to the spatial domain and is expressed as follows,

(2x +1)u

N-1

v=0 .

The transform basis patterns of the two dimensional DCT in the case of 8x8 is

shown as an example in Fig. 13.

After performing the DCT of a video signal, a significant portion of energy

tends to be concentrated in the DCT coefficients in the low frequency bands, even

if there is no statistical deviation in a pixel block itself. Therefore, coding is per-

formed according to the human visual system and the statistical deviation in the

DCT coefficient domain of an image signal. An example of an image after trans-

formation by DCT is shown in Fig. 14.

DCT coefficients are encoded by using zigzag scan and run length coding tech-

nique after quantization. Run length coding is a method of coding the combination

of (number, length) of the same kinds of continuous symbols. Higher power DCT

coefficients tend to be concentrated in the low frequency bands and the power be-

comes lower, even down to zero, as the frequency increases. The quantized in-

dexes obtained by quantization of the DCT coefficients are scanned in a zigzag

pattern from the low frequencies (upper left) to the high frequencies (lower right)

and are rearranged into a one dimensional series. The signal series is expressed as

a pair of the number of zeros (zero run) and a non-zero value following the zero

series (level). When the last non-zero value is reached, a special sign called EOB

(End of block) is assigned to reduce coding signals. By following this process, the

statistical nature of the signal series can be exploited. Namely, symbols that have a

large level will typically have a short zero run and symbols that have a long zero

run are typically associated with a small level. In this way, a variable length code

can be assigned to the combination of (zero run, level) to be compressed with

shorter codes assigned to more probable symbols and longer codes assigned to less

probable ones. The example of a zigzag scan and run length coding adopted in

MPEG-2 are shown in Fig. 15.

N-1

u=0

N-1

u=0

N-1

v=0

N-1

v=0

(2y +1)v

f (x, y )= 2

cos

C (u)C (v)

F (u, v)

High-Quality Visual Experience

Search WWH ::

Custom Search

Home