Information Technology Reference
In-Depth Information
As mentioned previously, SVC produces video frames which are partitioned
into FGS layers. We assume that each layer of each frame is packetized into
constant size packets of size γ for transmission. At the receiver, any unrecover-
able errors in each packet would result in dropping the packet and hence would
mean loss of the layer to which the packet belongs. We assume that the channel
coding rate and constellation used for the transmission of the base layers of all
key pictures is such that they are received error-free. Using the fact that SVC
encoding and decoding is done on a GOP basis, it is possible to use the frames
within a GOP for error concealment purposes. In the event of losing a frame,
temporal error concealment at the decoder is applied such that the lost frame
is replaced by the nearest available frame in the decreasing as well as increasing
sequential order but from only lower or same temporal levels. We start towards
the frames that have a temporal level closer to the temporal level of the lost
frame. For the frame in the center of the GOP, the key picture at the start of
the GOP is used for concealment.
As discussed in [11], the priority of the base layer (FGS0) of each temporal
level decreases from the lowest to the highest temporal level, and each FGS layer
for all the frames is considered as a single layer of even lesser priority. We will
refer to this method as
scalable decoder distortion estimation
(SDDE) method. Alternatively, we can consider both the base and the FGS layers
of the reference frames to be used for the encoding and the reconstruction of the
frames of higher temporal levels (non-key pictures). In such a case, both the base
and the FGS layers of the reference frames (from the lower temporal levels) are
considered of the same importance, and of higher importance than the frame(s)
(from a higher temporal level) to be motion-compensated and reconstructed.
We will refer to this case as the
Temporal-SNR
SNR-Temporal
SDDE method. Next we will
present the derivations of the two above-mentioned SDDE methods.
5.1 Temporal-SNR SDDE
In the following derivation of the Temporal-SNR SDDE method, we consider a
base layer and one FGS layer. We assume that the frames are converted into
vectors via lexicographic ordering and the distortion of each macroblock (and
hence, each frame) is the summation of the distortion estimated for all the pixels
in the macroblock of that frame. Let f n denote the original value of pixel i in
frame n and f n denote its encoder reconstruction. The reconstructed pixel value
at the decoder is denoted by f n . The mean square error for this pixel is defined
as [13]:
d i n =E f n
f n 2 = f n 2
2 f n E f n +E f n 2
(6)
where d i n is the distortion per pixel. The base layers of all the key pictures are
assumed to be received error-free. The s th
moment of the i th
pixel of the key
pictures n is calculated as
E f n s = P nE 1 f nB s
P nE 1 ) f n ( B,E 1) s
(7)
+(1
 
Search WWH ::




Custom Search