Spatiotemporal Video Upscaling Using Motion-Assisted Steering Kernel (MASK) Regression - High-Quality Visual Experience

Information Technology Reference

In-Depth Information

However, the difficulty of the motion-based approach is that, even though the mo-

tion vector field may be refined and/or smoothed, more complex transitions (e.g.

occlusions, transparency, and reflection) are not accurately treated. That is, motion

errors are inevitable even after smoothing/refining motion vector fields, and, hence,

an appropriate mechanism that takes care of the errors is necessary for producing

artifact-free outputs.

Unlike video processing algorithms which depend directly on motion vectors, in

a recent work, Protter et al. [11] proposed a video-to-video super-resolution method

without explicit motion estimation or compensation based on the idea of Non-Local

Means [12]. Although the method produces impressive spatial upscaling results even

without motion estimation, the computational load is very high due to the exhaustive

search (across space and time) for blocks similar to the block of interest. In a related

work [13], we presented a space-time video upscaling method, called 3-D iterative

steering kernel regression (3-D ISKR), in which explicit subpixel motion estima-

tion is again avoided. 3-D ISKR is an extension of 2-D steering kernel regression

(SKR) proposed in [14, 15]. SKR is closely related to bilateral filtering [16, 17] and

normalized convolution [18]. These methods can achieve accurate and robust image

reconstruction results, due to their use of robust error norms and locally adaptive

weighting functions. 2-D SKR has been applied to spatial interpolation, denoising

and deblurring [15, 18, 19]. In 3-D ISKR, instead of relying on motion vectors, the

3-D kernel captures local spatial and temporal orientations based on local covari-

ance matrices of gradients of video data. With the adaptive kernel, the method is

capable of upscaling video with complex motion both in space and time.

In this chapter, we build upon the 2-D steering kernel regression framework

proposed in [14], and develop a spatiotemporal (3-D) framework for processing

video. Specifically, we propose an approach we call motion-assisted steering kernel

(MASK) regression. The MASK function is a 3-D kernel, however, unlike as in 3-D

ISKR, the kernel function takes spatial (2-D) orientation and the local motion tra-

jectory into account separately, and it utilizes an analysis of the local orientation and

local motion vector to steer spatiotemporal regression kernels. Subsequently, local

kernel regression is applied to compute weighted least-squares optimal pixel esti-

mates. Although 2-D kernel regression has been applied to achieve super-resolution

reconstruction through fusion of multiple pre-registered frames on to a 2-D plane

[14, 18], the proposed method is different in that it does not require explicit mo-

tion compensation of the video frames. Instead, we use 3-D weighting kernels that

are “warped” according to estimated motion vectors, such that the regression pro-

cess acts directly upon the video data. Although we consider local motion vectors

in MASK, we propose an algorithm that is robust against errors in the estimated

motion field. Prior multi-frame resolution-enhanced or super-resolution (SR) recon-

struction methods ([2, 3]) often consider only global translational or affine motions;

local motion and object occlusions are often not addressed. Many SR methods re-

quire explicit motion compensation, which may involve interpolation or rounding of

displacements to grid locations. These issues can have a negative impact on accuracy

and robustness. Our proposed method is capable of handling local motions, avoids

explicit motion compensation, and is more robust. The proposed MASK approach is

High-Quality Visual Experience

Search WWH ::

Custom Search

Home