Video Streaming with Interactive Pan/Tilt/Zoom - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

Contrary to network-layer IP multicast, P2P streaming implements the multicas-

ting logic in software at the end-hosts rather than routers inside the network [49].

Unlike IP multicast, the application-layer software can be widely deployed with lit-

tle investment. Although the P2P approach generally results in more duplication of

packets and inefficient routing compared to IP multicast, the benefits outweigh the

inefficiencies. The source as well as each peer can respond to local retransmission

requests as well as perform sophisticated packet scheduling to maximize the expe-

rience of downstream peers [50].

P2P streaming systems can be broadly classified into mesh-pull vs. tree-push sys-

tems [51]. The design of mesh-pull systems evolved from P2P file-sharing systems.

In these systems, a peer advertises the chunks of data that it has and complies with

requests to relay chunks to other peers. Tree-push systems, on the other hand, dis-

tribute data using one or more complementary trees. After finding its place inside a

distribution tree, a peer generally persists to keep its association with the parent and

its children and relays data without waiting for requests from children. Generally,

tree-push systems result in fewer duplicate packets, lower end-to-end delay and less

delay-jitter [52, 53]. These traits are beneficial for interactive streaming systems

where select sub-streams of the coded content are required on-the-fly. A tree-based

P2P protocol has been recently proposed for interactive streaming of dynamic light

fields [54, 55]. Early results demonstrate the capability of the system to support

many more users with the same server resources as compared to traditional unicast

client-server streaming [55].

3

Spatial-Random-Access-Enabled Video Coding

We have proposed a spatial-random-access-enabled video coding scheme, shown

in Fig. 2, in our earlier work [56]. The coded representation consists of multiple

resolution layers. The thumbnail video constitutes a base layer and is coded with

H.264/AVC using I, P and B pictures. The reconstructed base layer video frames

are upsampled by a suitable factor and used as prediction signal for encoding video

corresponding to the higher resolution layers. Each frame belonging to a higher

resolution layer is coded using a grid of rectangular P slices. Employing upward

prediction from only the thumbnail enables efficient random access to local regions

within any spatial resolution. For a given frame-interval, the display of the client

is rendered by transmitting the corresponding frame from the base layer and few P

slices from exactly one higher resolution layer. Slices are transmitted from the reso-

lution layer that corresponds closest to the user's current zoom factor. At the client's

side, the corresponding RoI from this resolution layer is resampled to correspond to

the user's zoom factor. Thus, smooth zoom control can be rendered despite storing

only few dyadically spaced resolution layers at the server. Note that the encoding

takes place once and generates a repository of slices. Relevant slices can be served

to several clients depending on their individual RoIs. The encoding can either take

place live or offline beforehand.

Search WWH ::

Custom Search

Home