Video Streaming with Interactive Pan/Tilt/Zoom - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

well as from neighboring views can achieve higher compression efficiency but can

lead to undesirable dependencies for accessing random views. There exists a large

body of works that employs hybrid video coding for compressing multi-view data-

sets [28, 29, 30, 31, 32]. These studies highlight the trade-off in storage requirement,

mean transmission bit-rate and decoding complexity. Recently, an analytical frame-

work was proposed for optimizing the coding structure for coding multi-view data-

sets [33]. The framework allows multiple representations of a picture, for example,

compressed using different reference pictures. The optimization not only finds the

best coding structure but also determines the best set of coded pictures to transmit

corresponding to a viewing path. The framework can accommodate constraints like

limited step-size for view switching, permitting view switching only during certain

frame-intervals and capping the length of the burst of reference frames that are used

for decoding a viewed frame but are not themselves displayed. The framework can

minimize a weighted sum of expected transmission bit-rate and storage cost for

storing the compressed pictures.

The video compression standard H.264/AVC defines two new slice types, called

SP and SI slices. Using these slice types, it is possible to create multiple repre-

sentations of a video frame using different reference frames. Similar to the solu-

tions described above, the representation to be streamed is chosen according to the

reference frames available at the decoder. However, the novelty is that the recon-

struction is guaranteed to be identical. This drastically reduces the total number of

multiple representations required to be stored. SP frames have been used for in-

teractive streaming of static light fields [34, 35]. Another solution to the random

access problem associated with multi-view data-sets is based on distributed source

coding (DSC) [36, 37]. In this solution, an interframe coded picture is represented

using enough parity bits which leads to an identical reconstruction irrespective of

the reference frame used by the decoder. This implies that multiple representations

are not required to be stored, however, the number of parity bits is determined by

the reference frame having the least correlation to the frame to be coded. Similar to

some prior work based on hybrid video coding for multi-view data-sets mentioned

above, recent work based on DSC also explores the trade-off between transmission

bit-rate and storage requirement [38].

2.2

Navigation Path Prediction

A simple user-input device, for example a computer mouse, typically senses po-

sition. More sophisticated devices like game-controllers can also measure velocity

and/or acceleration. Studies on view trajectory prediction have been conducted in

the context of Virtual Reality [39] and networked multi-player video games [40].

A common navigation path prediction technique, dead reckoning, predicts the fu-

ture path by assuming that the user maintains the current velocity. The velocity

can be either read from the input device or computed from successive position

measurements.

Search WWH ::

Custom Search

Home