Spatiotemporal Video Upscaling Using Motion-Assisted Steering Kernel (MASK) Regression - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

camera projection violate the motion model. In practice, errors in motion vectors

are inevitable and it is important to provide a fall-back mechanism in order to

avoid visual artifacts.

Quantization of Orientation Map

The estimation of spatial orientations or steering covariance matrices C i in (18)

involves singular value decomposition (SVD), which represents significant com-

putational complexity. Instead of using the SVD, we use a pre-defined lookup

table containing a set of candidate covariance matrices, and locally select an

appropriate matrix from the table. Since the lookup table contains only stable

(invertible) covariance matrices, the estimation process remains robust.

Adaptive Regression Order

A higher regression order (e.g. N = 2 in this chapter) preserves high frequency

components in filtered images, although it requires more computation (11). On

the other hand, zeroth regression order ( N = 0) has lower computational cost, but

it has a stronger smoothing effect. Although second order regression is prefer-

able, it is only needed at pixel locations in texture and edge regions. Moreover,

in terms of noise reduction, zeroth order regression is more suitable in flat re-

gions. We propose to adjust the order N locally, based on the scaling parameter

(

γ i ). Consequently, this adaptive approach keeps the total computational cost low

while it preserves, and even enhances, high frequency components.

4.1

Block-by-Block Processing

The overall MASK algorithm consists of several operations (i.e. estimating spatial

and temporal gradients, spatial orientations, and motions as shown in Fig. 6 and

finally applying kernel regression), and it is possible to implement these in, e.g.,

a pixel-by-pixel process or a batch process. In a pixel-by-pixel process, we esti-

mate gradients, orientations, and motions one-by-one, and then finally estimate a

pixel value. Note that most of these operations require calculations involving other

pixels in a neighborhood around the pixel of interest. Since the neighborhoods of

nearby pixels may overlap significantly, frequently the same calculation would be

performed multiple times. Hence, a pixel-by-pixel implementation suffers from a

large computational load. On the other hand, this implementation requires very lit-

tle memory. In a batch process, we estimate gradients for all pixels in an entire

frame and store the results in memory, then estimate orientations of all pixels and

store those results, etc. In the batch implementation, we need a large memory space

to store intermediate results for all pixels in a frame; however, it avoids repeated

calculations. This type of process is impractical for a hardware implementation.

As a compromise, in order to limit both the computational load and the use of

memory, we process a video frame in a block-by-block manner, where each block

contains, e.g., 8

16 pixels. Further reduction of the computational load

is achieved by using a block-based motion model: we assume that, within a block,

the motion of all the pixels follow a parametric model, e.g, translational or affine. In

this chapter, we fix the block size to 8

×

8or16

×

8 pixels and we use the translational motion

High-Quality Visual Experience

Search WWH ::

Custom Search

Home