Graphics Reference
In-Depth Information
to perform higher level geometric operations than the vertex based stages do, before the
geometry is rasterized. While this is extremely useful in some situations, it is less common
to use the geometry shader in most algorithms. The majority of the algorithms that use
geometry shaders use them because of its special features that aren't available in any other
stage. The expansion of points into quads is a good example of functionality that can't be
performed in any of the other stages.
This is probably at least partially due to the poor performance that the geometry
shader became known for during its debut in Direct3D 10, and to a corresponding lack
of development effort geared toward it. However, with the shared processor architectures
that most current generation GPUs use, sufficient processing power is available to use the
geometry shader. The biggest challenge is to ensure that the memory usage of the stage
is appropriately balanced with the tasks that are being performed. For example, if each
geometry shader invocation is used to produce 100 output triangles, the algorithm should
likely be reevaluated to take advantage of the tessellation stages, instead of the geometry
shader. However, for small-scale geometry manipulation, there is a much better balance of
memory usage to computation. This makes point sprite expansion an attractive target—it
performs some calculations, and a small amount of data amplification. At the very least, it
will be interesting to see if more algorithms are developed to use the geometry shader in
the coming years, or if it will remain as a specialty stage in the pipeline.
3.13.4 Raster-Based Manipulations
The final portion of the pipeline is the raster-based stages. These include the rasterizer,
pixel shader, and output merger. These stages operate at a much higher frequency that the
previous stages, due to the data amplification that is performed in the rasterizer. This allows
for much higher frequency operations to be performed in these stages, which is why the
pixel shader is so frequently used to add all of the detail to a rendering. The rasterizer is
typically not a bottleneck in rendering algorithms, while the pixel shader can potentially be
either a calculation or memory bandwidth bottleneck, due to the large number of invoca-
tions that are performed.
A somewhat less recognized performance issue can be introduced by the output
merger. Since it reads and writes to the depth stencil buffer and can potentially read and
write to the render targets when blending is enabled, the output merger can greatly increase
bandwidth usage. If you don't need to use the blending function, make sure it is disabled!
Likewise, if there is no reason to expect updating of the depth buffer (such as in a second
or third rendering pass), ensure that the depth writing functionality is disabled.
In addition to these considerations, an entire class of algorithms is just now starting to
be developed that use the unordered access views in the pixel shader. The ability to perform
custom reading and writing of resources at arbitrary locations provides a totally new way to
Search WWH ::




Custom Search