Digital Signal Processing Reference
In-Depth Information
Fig. 3
Representations of a 3D scene: ( a ) epipolar image, ( b ) side-by-side stereoscopic pair, ( c )
2D
+
Z image pair, and ( d )mesh
multiview image [ 17 ] . A relatively simple way to store multiview image is to
combine all observations in a single bitmap. For a stereoscopic image, both views
can be stored in a side-by-side fashion as shown in Fig. 3 b . A more sophisticated
approach is to encode the differences between the observations similarly to the
way temporal similarities are encoded in a video file as done in MPEG-4 MVC
[ 20 ] . Multiview images are one of the most often used formats for natural scene
description.
The second group of scene representations is video-plus-depth where each pixel
is augmented with information of its distance from the camera. A straightforward
way to represent video-plus-depth is to encode the depth map as a grey scale picture
and place the 2D image and its depth map side-by-side. The intensity of each depth
map pixel represents the depth of the corresponding pixel from the 2D image. Such
format is sometimes referred to as 2D
Z and an example of this representation
of a scene is shown in Fig. 3 c . Video-plus-depth format can be used to render
virtual views based on the geometrical information about the scene encoded in the
depth map. Thus, it is suitable for multiview displays and can be used regardless
of the number of views a particular screen provides [ 17 , 21 ] . Furthermore, video-
plus-depth can be efficiently compressed. Recently, MPEG specified a container
format for video-plus-depth data known as MPEG-4 Part-3 [ 20 ] . On the downside,
rendering scene observations using 2D
+
Z description requires disocclusion filling,
which can introduce artifacts. This is being addressed by using layered depth images
(LDI) [ 17 ] or by multi-video-plus-depth encoding [ 22 ] . A dense depth map is not
captured directly but can be derived from multiview images (using depth estimation
algorithms) or from point cloud data captured by range sensors. In the case of a
synthetic 3D scene, obtaining a dense depth map is a straightforward process as
solving the occlusions during rendering requires calculation of the distance between
camera and each pixel of the image [ 23 ] .
The third group of representations store scene geometry in a vectorized form.
One example is a dynamic 3D mesh [ 20 ] . Such representation is suitable for
synthetic content since synthetic 3D scenes are described in terms of shapes and
+
 
Search WWH ::




Custom Search