The Rendering Pipeline - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

to perform higher level geometric operations than the vertex based stages do, before the

geometry is rasterized. While this is extremely useful in some situations, it is less common

to use the geometry shader in most algorithms. The majority of the algorithms that use

geometry shaders use them because of its special features that aren't available in any other

stage. The expansion of points into quads is a good example of functionality that can't be

performed in any of the other stages.

This is probably at least partially due to the poor performance that the geometry

shader became known for during its debut in Direct3D 10, and to a corresponding lack

of development effort geared toward it. However, with the shared processor architectures

that most current generation GPUs use, sufficient processing power is available to use the

geometry shader. The biggest challenge is to ensure that the memory usage of the stage

is appropriately balanced with the tasks that are being performed. For example, if each

geometry shader invocation is used to produce 100 output triangles, the algorithm should

likely be reevaluated to take advantage of the tessellation stages, instead of the geometry

shader. However, for small-scale geometry manipulation, there is a much better balance of

memory usage to computation. This makes point sprite expansion an attractive target—it

performs some calculations, and a small amount of data amplification. At the very least, it

will be interesting to see if more algorithms are developed to use the geometry shader in

the coming years, or if it will remain as a specialty stage in the pipeline.

3.13.4 Raster-Based Manipulations

The final portion of the pipeline is the raster-based stages. These include the rasterizer,

pixel shader, and output merger. These stages operate at a much higher frequency that the

previous stages, due to the data amplification that is performed in the rasterizer. This allows

for much higher frequency operations to be performed in these stages, which is why the

pixel shader is so frequently used to add all of the detail to a rendering. The rasterizer is

typically not a bottleneck in rendering algorithms, while the pixel shader can potentially be

either a calculation or memory bandwidth bottleneck, due to the large number of invoca-

tions that are performed.

A somewhat less recognized performance issue can be introduced by the output

merger. Since it reads and writes to the depth stencil buffer and can potentially read and

write to the render targets when blending is enabled, the output merger can greatly increase

bandwidth usage. If you don't need to use the blending function, make sure it is disabled!

Likewise, if there is no reason to expect updating of the depth buffer (such as in a second

or third rendering pass), ensure that the depth writing functionality is disabled.

In addition to these considerations, an entire class of algorithms is just now starting to

be developed that use the unordered access views in the pixel shader. The ability to perform

custom reading and writing of resources at arbitrary locations provides a totally new way to

Search WWH ::

Custom Search

Home