Graphics Reference
In-Depth Information
12
13
14
15
16
17
18
19
20
21
22
// The depth test will run directly on the interpolated value in
// Q.z/Q.w, which is going to be smallest at the far plane
gpu->setDepthTest(RenderDevice::DEPTH_GREATER);
gpu->setDepthClearValue(0.0);
while (! done) {
loopBody(gpu);
processUserInput();
}
...
15.8 Performance and Optimization
We'll now consider several examples of optimization in hardware-based render-
ing. This is by no means an exhaustive list, but rather a set of model techniques
from which you can draw ideas to generate your own optimizations when you
need them.
15.8.1 Abstraction Considerations
Many performance optimizations will come at the price of significantly compli-
cating the implementation. Weigh the performance advantage of an optimization
against the additional cost of debugging and code maintenance. High-level algo-
rithmic optimizations may require significant thought and restructuring of code,
but they tend to yield the best tradeoff of performance for code complexity. For
example, simply dividing the screen in half and asynchronously rendering each
side on a separate processor nearly doubles performance at the cost of perhaps 50
additional lines of code that do not interact with the inner loop of the renderer.
In contrast, consider some low-level optimizations that we intentionally passed
over. These include reducing common subexpressions (e.g., mapping all of those
repeated divisions to multiplications by an inverse that is computed once) and lift-
ing constants outside loops. Performing those destroys the clarity of the algorithm,
but will probably gain only a 50% throughput improvement.
This is not to say that low-level optimizations are not worthwhile. But they are
primarily worthwhile when you have completed your high-level optimizations;
at that point you are more willing to complicate your code and its maintenance
because you are done adding features.
15.8.2 Architectural Considerations
The primary difference between the simple rasterizer and ray caster described
in this chapter is that the “for each pixel” and “for each triangle” loops have the
opposite nesting. This is a trivial change and the body of the inner loop is largely
similar in each case. But the trivial change has profound implications for memory
access patterns and how we can algorithmically optimize each.
Scene triangles are typically stored in the heap. They may be in a flat 1D
array, or arranged in a more sophisticated data structure. If they are in a simple
data structure such as an array, then we can ensure reasonable memory coherence
by iterating through them in the same order that they appear in memory. That pro-
duces efficient cache behavior. However, that iteration also requires substantial
 
 
 
 
 
Search WWH ::




Custom Search