Graphics Reference
In-Depth Information
3D region corresponding to the tile. In scenes with thousands of lights the perfor-
mance balance can tilt toward deferred shading.
Difficulties with deferred shading include the following.
Excess storage and bandwidth: Burdening each pixel in the framebuffer
with the information required for its shading is a significant expense in
both storage and bandwidth. Indexing helps: Textures can be specified by
reference; even better, the entire shader can be specified by reference. But
parameters such as surface normals and interpolated texture coordinates are
specific to each pixel, so they require storage by value. Variability in stor-
age requirements complicates the situation, because maximum per-pixel
storage requirement is not easily inferred at the start of rendering, and nei-
ther Direct3D nor OpenGL requires that it be specified.
Incompatibility with multi-sample anti-aliasing (MSAA): Multi-sample
anti-aliasing, the currently preferred approach to reducing edge artifacts in
full-scene rendering, requires storage for multiple color and depth samples
at each pixel. For example, multi-sample anti-aliasing with four samples
per pixel increases framebuffer storage requirements by a factor of four. In
combination with deferred shading, multi-sample anti-aliasing increases
already burdensome framebuffer storage requirements by this same factor.
Shading calculations are also increased by this factor, undoing the central
optimization of multi-sample anti-aliasing, which is limiting shading cal-
culations to one per pixel. The real possibility of a net increase in shading
calculations, and the certainty of increased storage, make deferred shading
incompatible with multi-sample anti-aliasing. 17
No shader-specified visibility: High-performance rendering sometimes
approximates the visibility of complex geometry, such as foliage, with an
alpha matte rendered as a texture. This optimization is inconsistent with
deferred shading, whose goal is full determination of visibility prior to
shading.
Against this seemingly bleak background, the story of deferred shading has a
surprisingly happy ending. All modern GPUs, including the GeForce 9800 GTX,
implement an optimization called early z -cull. An outline of the algorithm fol-
lows. As the frame is rendered, the GPU builds a hierarchical structure of z -values
in dedicated local memory. Rasterized fragments are tested against this z -pyramid
and are either culled (prior to shading) if they are not visible, or delivered to the
framebuffer (and added to the z -pyramid) if they are visible. Little extra storage,
and no main-memory bandwidth, are required, yet much of the performance gain
of deferred shading is achieved, especially when programmers deliver scene data
in an order that approximates front to back (i.e., rendering objects that are progres-
sively farther from the viewpoint). While exact front-to-back ordering is a huge
burden to programmers, approximating this ordering is often straightforward, and
the penalty for partial failure is low (just a slight decrease in performance). Alter-
natively, application programmers sometimes choose to render the entire scene
twice—first with shading disabled to create the full z -pyramid, and then again
with shading enabled to shade exactly the visible fragments—to fully achieve
17. At the time of this writing, researchers are actively investigating deferred-shading algo-
rithms that are compatible with multi-sample anti-aliasing.
 
Search WWH ::




Custom Search