Information Technology Reference
In-Depth Information
3.2.2.1 DCT
DCT converts the spatial domain signal into the frequency domain using a win-
dow with fixed width for the transformation. Usually, a picture is divided into
NxN pixel blocks (N pixels width both horizontal and vertical directions) and the
transform is performed for each pixel block. The DCT is expressed as follows,
N-1
x=0
N-1
x=0
N-1
y=0
N-1
y=0
(2x +1)u
(2x +1)u
(2x +1)u
(2y +1)v
(2y +1)v
(2y +1)v
π
π
π
π
π
π
2
N
2
N
cos
cos
cos
cos
cos
cos
F (u, v )=
F (u, v )=
C (u)C (v)
C (u)C (v)
f (x, y)
f (x, y)
2N
2N
2N
2N
2N
2N
where
1
1
1
( u, v = 0 )
( u, v = 0 )
2
2
2
C(u), C(v)=
C(u), C(v)=
1
1
( u, v
( u, v
0 )
0 )
.
On the other hand, the inverse transform (IDCT) reconverts a transformed signal
to the spatial domain and is expressed as follows,
(2x +1)u
(2x +1)u
(2x +1)u
N-1
v=0 .
The transform basis patterns of the two dimensional DCT in the case of 8x8 is
shown as an example in Fig. 13.
After performing the DCT of a video signal, a significant portion of energy
tends to be concentrated in the DCT coefficients in the low frequency bands, even
if there is no statistical deviation in a pixel block itself. Therefore, coding is per-
formed according to the human visual system and the statistical deviation in the
DCT coefficient domain of an image signal. An example of an image after trans-
formation by DCT is shown in Fig. 14.
DCT coefficients are encoded by using zigzag scan and run length coding tech-
nique after quantization. Run length coding is a method of coding the combination
of (number, length) of the same kinds of continuous symbols. Higher power DCT
coefficients tend to be concentrated in the low frequency bands and the power be-
comes lower, even down to zero, as the frequency increases. The quantized in-
dexes obtained by quantization of the DCT coefficients are scanned in a zigzag
pattern from the low frequencies (upper left) to the high frequencies (lower right)
and are rearranged into a one dimensional series. The signal series is expressed as
a pair of the number of zeros (zero run) and a non-zero value following the zero
series (level). When the last non-zero value is reached, a special sign called EOB
(End of block) is assigned to reduce coding signals. By following this process, the
statistical nature of the signal series can be exploited. Namely, symbols that have a
large level will typically have a short zero run and symbols that have a long zero
run are typically associated with a small level. In this way, a variable length code
can be assigned to the combination of (zero run, level) to be compressed with
shorter codes assigned to more probable symbols and longer codes assigned to less
probable ones. The example of a zigzag scan and run length coding adopted in
MPEG-2 are shown in Fig. 15.
N-1
u=0
N-1
u=0
N-1
v=0
N-1
v=0
π
π
π
(2y +1)v
(2y +1)v
(2y +1)v
f (x, y )= 2
N
f (x, y )= 2
N
2
N
π
π
π
cos
cos
cos
cos
cos
cos
C (u)C (v)
C (u)C (v)
F (u, v)
F (u, v)
2N
2N
2N
2N
2N
2N
 
Search WWH ::




Custom Search