Modern Graphics Hardware - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

3D region corresponding to the tile. In scenes with thousands of lights the perfor-

mance balance can tilt toward deferred shading.

Difficulties with deferred shading include the following.

• Excess storage and bandwidth: Burdening each pixel in the framebuffer

with the information required for its shading is a significant expense in

both storage and bandwidth. Indexing helps: Textures can be specified by

reference; even better, the entire shader can be specified by reference. But

parameters such as surface normals and interpolated texture coordinates are

specific to each pixel, so they require storage by value. Variability in stor-

age requirements complicates the situation, because maximum per-pixel

storage requirement is not easily inferred at the start of rendering, and nei-

ther Direct3D nor OpenGL requires that it be specified.

• Incompatibility with multi-sample anti-aliasing (MSAA): Multi-sample

anti-aliasing, the currently preferred approach to reducing edge artifacts in

full-scene rendering, requires storage for multiple color and depth samples

at each pixel. For example, multi-sample anti-aliasing with four samples

per pixel increases framebuffer storage requirements by a factor of four. In

combination with deferred shading, multi-sample anti-aliasing increases

already burdensome framebuffer storage requirements by this same factor.

Shading calculations are also increased by this factor, undoing the central

optimization of multi-sample anti-aliasing, which is limiting shading cal-

culations to one per pixel. The real possibility of a net increase in shading

calculations, and the certainty of increased storage, make deferred shading

incompatible with multi-sample anti-aliasing. 17

• No shader-specified visibility: High-performance rendering sometimes

approximates the visibility of complex geometry, such as foliage, with an

alpha matte rendered as a texture. This optimization is inconsistent with

deferred shading, whose goal is full determination of visibility prior to

shading.

Against this seemingly bleak background, the story of deferred shading has a

surprisingly happy ending. All modern GPUs, including the GeForce 9800 GTX,

implement an optimization called early z -cull. An outline of the algorithm fol-

lows. As the frame is rendered, the GPU builds a hierarchical structure of z -values

in dedicated local memory. Rasterized fragments are tested against this z -pyramid

and are either culled (prior to shading) if they are not visible, or delivered to the

framebuffer (and added to the z -pyramid) if they are visible. Little extra storage,

and no main-memory bandwidth, are required, yet much of the performance gain

of deferred shading is achieved, especially when programmers deliver scene data

in an order that approximates front to back (i.e., rendering objects that are progres-

sively farther from the viewpoint). While exact front-to-back ordering is a huge

burden to programmers, approximating this ordering is often straightforward, and

the penalty for partial failure is low (just a slight decrease in performance). Alter-

natively, application programmers sometimes choose to render the entire scene

twice—first with shading disabled to create the full z -pyramid, and then again

with shading enabled to shade exactly the visible fragments—to fully achieve

17. At the time of this writing, researchers are actively investigating deferred-shading algo-

rithms that are compatible with multi-sample anti-aliasing.

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home