Video Streaming with Interactive Pan/Tilt/Zoom - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

4

Pre-fetching Based on RoI Prediction

The rationale behind pre-fetching is lowering the latency of interaction. Imagine

that frame number n is being rendered on the screen. At this point, the user's RoI

selectionuptoframe n has been observed. The goal is to predict the user's RoI at

frame n + d ahead of time and pre-fetch relevant slices.

Extrapolating the Navigation Trajectory. In our own work [63, 64], we have

used an autoregressive moving average (ARMA) model to estimate the velocity of

the RoI center:

v t =

α

v t − 1 +(1

− α

)( p t −

p t − 1 ) ,

(2)

where, the co-ordinates of the RoI center, observed up to frame n ,aregivenby p t =

( x t , y t ) for t = 0 , 1 ..., n . The predicted RoI center co-ordinates p n + d =( x n + d , y n + d )

for frame n + d are given by

p n + d = p n + dv n ,

(3)

suitably adjusted if the RoI happens to veer off the extent of the video frame. The

prediction lookahead, d frames, should be chosen by taking into account network

delays and the desired interaction latency. The parameter

above trades off respon-

siveness to the user's RoI trajectory and smoothness of the predicted trajectory.

α

Video-Content-Aware RoI Prediction. Note that the approach described above is

agnostic of the video content. We have explored video-content-aware RoI predic-

tion that analyzes the motion of objects in the video to improve the RoI predic-

tion [63, 64]. The transmission system in this work employs the multi-resolution

video coding scheme presented in Sect. 3. The transmission system ensures that

some future thumbnail video frames are buffered at the client's side. Figure 4 il-

lustrates client-side video-content-aware RoI prediction. Following are some ap-

proaches explored in [63]:

1. Optical flow estimation techniques, for example the Kanade-Lucas-Tomasi (KLT)

feature tracker [65], can find feature points in buffered thumbnail frames and

track the features in successive frames. The feature closest to the RoI center in

frame n can be followed up to frame n + d . The location of the tracked feature

point can be made the center of the predicted RoI in frame n + d or the predicted

RoI can be chosen such that the tracked feature point appears in the same rel-

ative location. Alternatively, a smoother trajectory can be obtained by making

a change to the RoI center only if the feature point moves more than a certain

distance away from the RoI center.

2. Depending on the chosen optical flow estimation technique, the above approach

can be computationally intensive. An alternative approach exploits MVs con-

tained in the buffered thumbnail bit-stream. The MVs are used to find a plausible

propagation of the RoI center pixel in every subsequent frame up to frame n + d .

The location of the propagated pixel in frame n + d is deemed to be the center of

the predicted RoI. Although the MVs are rate-distortion optimized and might not

reflect true motion, the results are competitive to those obtained with the KLT

High-Quality Visual Experience

Search WWH ::

Custom Search

Home