Graphics Reference
In-Depth Information
According to GPU PerfStudio 2, we are having 50% cache misses because
of nonlocal texture buffer accesses when traversing using the Hi-Z acceleration
structure and we also suffer from noncoherent dynamic branching since a GPU
executes branches in lock-step mode. If an entire bucket of threads (group of
32 threads for Nvidia called Warp, 64 for AMD called Wavefront) does not take
the same branch, then we pay the penalty of stalling some threads until they
converge again into the same path. This gets worse as the threads keep taking
different branches for some pixels.
One optimization that was not tried, but mentioned by [Tevs et al. 08], is
using a 3D texture to store the Hi-Z instead of a 2D texture. According to [Tevs
et al. 08], using a 3D texture for a displacement mapping technique, where each
slice represents the hierarchy levels of our Hi-Z, gives better cache hits and a
performance boost of 20% due to less L2 trac and more texture cache hits.
Since we are memory latency bound due to cache misses and incoherent tex-
ture accesses, while jumping up and down in the hierarchy, this might be a good
optimization to try, though it would use much more memory.
4.9 Results
The presented algorithm works really well and produces great reflections, both
specular and glossy, and it runs at easily affordable speeds for games. The most
noticeable detail is the spread of reflections as they get farther away from the
source, which is the selling point of the entire algorithm. (See Figures 4.27
and 4.28.)
4.10 Conclusion
In this chapter we looked at Hi-Z Screen-Space Cone Tracing to compute both
specular and glossy reflections at game interactive frame rates and performance
Figure 4.27. Cone-tracing algorithm with different level of glossiness on the tile material, giving the
appearance of diverged reflection rays. The reflection becomes more spread the farther away it is, and it
is stretching just like the phenomena we see in the real world.
Search WWH ::




Custom Search