Video Streaming with Interactive Pan/Tilt/Zoom - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

transmit the entire picture while delivering the RoI with higher quality. Among the

class of such systems, some employ JPEG2000 with RoI support and conditional

replenishment for exploiting correlation among successive frames [16]. Parts of the

image that are not replenished can be copied from the previous frame or a back-

ground store.

In our own work, we have proposed a video transmission system for interactive

pan/tilt/zoom [17]. This system crops the RoI sequence from the high-resolution

video and encodes it using H.264/AVC. The RoI cropping is adapted to yield effi-

cient motion compensation in the video encoder. The RoI adjustment is confined to

ensure that the user does not notice the manipulation and experiences accurate RoI

control. The normal mode of operation for this system is streaming live content but

we also allow the user to rewind and play back older video. Note that in the second

mode of operation, the high-resolution video is decoded prior to cropping the RoI

sequence. Although efficient in terms of transmitted bit-rate, the drawback is that

RoI video encoding has to be invoked for each user, thus limiting the system to few

users. This system targets remote surveillance in which the number of simultaneous

users is likely to be less than other applications like interactive TV.

Video coding for spatial random access presents a special challenge. To achieve

good compression efficiency, video compression schemes typically exploit correla-

tion among successive frames. This is accomplished through motion-compensated

interframe prediction [18, 19, 20]. However, this makes it difficult to provide ran-

dom access for spatial browsing within the scene. This is because the decoding of

a block of pixels requires that other reference frame blocks used by the predictor

have previously been decoded. These reference frame blocks might lie outside the

RoI and might not have been transmitted and/or decoded earlier.

Coding, transmission and rendering of high-resolution panoramic videos using

MPEG-4 is proposed in [21, 22]. A limited part of the entire scene is transmitted to

the client depending on the chosen viewpoint. Only intraframe coding is used to al-

low random access. The scene is coded into independent slices. The authors mention

the possibility of employing interframe coding to gain more compression efficiency.

However, they note that this involves transmitting slices from the past if the current

slice requires those for its decoding. A longer intraframe period entails significant

complexity for slices from the latter frames in the group of pictures (GOP), as this

“dependency chain” grows.

Multi-View Images/Videos. Interactive streaming systems that provide virtual fly-

around in the scene employ novel-view generation to render views of the scene

from arbitrary viewpoints. With these systems, the user can experience more free

interactive navigation in the scene [23, 24, 25]. These systems typically employ

image-based rendering (IBR) which is a technique to generate the novel view from

multiple views of the scene recorded using multiple cameras [26, 27]. Note that in

these applications, the scene itself might or might not be evolving in time. Trans-

mitting arbitrary views from the multi-view data-set on-the-fly also entails random

access issues similar to those arising for transmitting arbitrary regions in interac-

tive pan/tilt/zoom. Interframe coding for compressing successive images in time as

High-Quality Visual Experience

Search WWH ::

Custom Search

Home