Signal Processing for Stereoscopic and Multi-View 3D Displays - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

Fig. 3

Representations of a 3D scene: ( a ) epipolar image, ( b ) side-by-side stereoscopic pair, ( c )

2D

+

Z image pair, and ( d )mesh

multiview image [ 17 ] . A relatively simple way to store multiview image is to

combine all observations in a single bitmap. For a stereoscopic image, both views

can be stored in a side-by-side fashion as shown in Fig. 3 b . A more sophisticated

approach is to encode the differences between the observations similarly to the

way temporal similarities are encoded in a video file as done in MPEG-4 MVC

[ 20 ] . Multiview images are one of the most often used formats for natural scene

description.

The second group of scene representations is video-plus-depth where each pixel

is augmented with information of its distance from the camera. A straightforward

way to represent video-plus-depth is to encode the depth map as a grey scale picture

and place the 2D image and its depth map side-by-side. The intensity of each depth

map pixel represents the depth of the corresponding pixel from the 2D image. Such

format is sometimes referred to as 2D

Z and an example of this representation

of a scene is shown in Fig. 3 c . Video-plus-depth format can be used to render

virtual views based on the geometrical information about the scene encoded in the

depth map. Thus, it is suitable for multiview displays and can be used regardless

of the number of views a particular screen provides [ 17 , 21 ] . Furthermore, video-

plus-depth can be efficiently compressed. Recently, MPEG specified a container

format for video-plus-depth data known as MPEG-4 Part-3 [ 20 ] . On the downside,

rendering scene observations using 2D

+

Z description requires disocclusion filling,

which can introduce artifacts. This is being addressed by using layered depth images

(LDI) [ 17 ] or by multi-video-plus-depth encoding [ 22 ] . A dense depth map is not

captured directly but can be derived from multiview images (using depth estimation

algorithms) or from point cloud data captured by range sensors. In the case of a

synthetic 3D scene, obtaining a dense depth map is a straightforward process as

solving the occlusions during rendering requires calculation of the distance between

camera and each pixel of the image [ 23 ] .

The third group of representations store scene geometry in a vectorized form.

One example is a dynamic 3D mesh [ 20 ] . Such representation is suitable for

synthetic content since synthetic 3D scenes are described in terms of shapes and

+

Search WWH ::

Custom Search

Home