Scalable H.264 Wireless Video Transmission over MIMO-OFDM Channels - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

As mentioned previously, SVC produces video frames which are partitioned

into FGS layers. We assume that each layer of each frame is packetized into

constant size packets of size γ for transmission. At the receiver, any unrecover-

able errors in each packet would result in dropping the packet and hence would

mean loss of the layer to which the packet belongs. We assume that the channel

coding rate and constellation used for the transmission of the base layers of all

key pictures is such that they are received error-free. Using the fact that SVC

encoding and decoding is done on a GOP basis, it is possible to use the frames

within a GOP for error concealment purposes. In the event of losing a frame,

temporal error concealment at the decoder is applied such that the lost frame

is replaced by the nearest available frame in the decreasing as well as increasing

sequential order but from only lower or same temporal levels. We start towards

the frames that have a temporal level closer to the temporal level of the lost

frame. For the frame in the center of the GOP, the key picture at the start of

the GOP is used for concealment.

As discussed in [11], the priority of the base layer (FGS0) of each temporal

level decreases from the lowest to the highest temporal level, and each FGS layer

for all the frames is considered as a single layer of even lesser priority. We will

refer to this method as

scalable decoder distortion estimation

(SDDE) method. Alternatively, we can consider both the base and the FGS layers

of the reference frames to be used for the encoding and the reconstruction of the

frames of higher temporal levels (non-key pictures). In such a case, both the base

and the FGS layers of the reference frames (from the lower temporal levels) are

considered of the same importance, and of higher importance than the frame(s)

(from a higher temporal level) to be motion-compensated and reconstructed.

We will refer to this case as the

Temporal-SNR

SNR-Temporal

SDDE method. Next we will

present the derivations of the two above-mentioned SDDE methods.

5.1 Temporal-SNR SDDE

In the following derivation of the Temporal-SNR SDDE method, we consider a

base layer and one FGS layer. We assume that the frames are converted into

vectors via lexicographic ordering and the distortion of each macroblock (and

hence, each frame) is the summation of the distortion estimated for all the pixels

in the macroblock of that frame. Let f n denote the original value of pixel i in

frame n and f n denote its encoder reconstruction. The reconstructed pixel value

at the decoder is denoted by f n . The mean square error for this pixel is defined

as [13]:

d i n =E f n −

f n 2 = f n 2

2 f n E f n +E f n 2

−

(6)

where d i n is the distortion per pixel. The base layers of all the key pictures are

assumed to be received error-free. The s th

moment of the i th

pixel of the key

pictures n is calculated as

E f n s = P nE 1 f nB s

P nE 1 ) f n ( B,E 1) s

−

(7)

+(1

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home