Video Streaming with Interactive Pan/Tilt/Zoom - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

earlier. This is different from prior work [61] employing a background pyramid, in

which the encoder uses only those parts of the background for prediction that exist

in the decoder's multi-resolution background pyramid. In [61], the encoder mim-

ics the decoder which builds a background pyramid out of all previously received

frames. Note that the camera is likely to be static in such applications since a moving

camera might hamper the interactive browsing experience. Background extraction

is generally easier with a static camera. Background extraction algorithms as well

as detection and update of changed background portions have been previously stud-

ied, for example in [62]. Note that the improved coding scheme entails transmitting

some I slices from the background frame that might be required for decoding the

current high-resolution P slice. Nevertheless, the cost of doing this is amortized

over the streaming session. Bit-rate reduction of 70-80% can be obtained with this

improvement while retaining efficient random access.

Optimal Slice Size. Generally, whenever tiles or slices are employed, choosing the

tile size or slice size poses the following trade-off. On one hand, a smaller slice size

reduces the overhead of transmitted pixels. The overhead is constituted by pixels

that have to be transmitted due to the coarse slice grid but are not used for rendering

the display. On the other hand, reducing the slice size worsens the coding efficiency.

This is due to increased number of headers and inability to exploit correlation across

the slices. The optimal slice size depends on the RoI display dimensions, the dimen-

sions of the high-spatial-resolution video, the content itself and the distribution of

the user-selected zoom-factor. Nevertheless, we have demonstrated in prior work

that stochastic analysis can estimate the expected number of transmitted pixels per

frame [56]. This quantity, denoted by

( s w , s h ), is a function of the slice width, s w

and the slice height, s h . The average number of bits per pixel required to encode

the high-resolution video frame, denoted by

ψ

( s w , s h ), can also be observed or esti-

mated as a function of the slice size. The optimal slice size is the one that minimizes

the expected number of bits transmitted per frame,

η

( s op w , s op h )=arg min

( s w , s h ) η

( s w , s h )

× ψ

( s w , s h ) .

(1)

The results in our earlier work show that the optimal slice size can be determined

accurately without capturing user-interaction trajectories [56]. Although the model

predicts the optimal slice size accurately, it can underestimate or overestimate the

transmitted bit-rate. This is because the popular slices that constitute the salient ob-

jects in the video might entail high or low bit-rate compared to the average. Also, the

location of the objects can bias the pixel overhead to the high or low side, whereas

the model uses the average overhead. Note that the cost function in (1) can be re-

placed with a Lagrangian cost function that minimizes the weighted sum of the

average transmission bit-rate and the incurred storage cost. The storage cost can be

represented by an appropriate constant multiplying

η

( s w , s h ).

High-Quality Visual Experience

Search WWH ::

Custom Search

Home