Ray Casting and Rasterization - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

bandwidth because the entire scene will be processed for each pixel. If we use a

more sophisticated data structure, then we likely will reduce bandwidth but also

reduce memory coherence. Furthermore, adjacent pixels likely sample the same

triangle, but by the time we have iterated through to testing that triangle again it

is likely to have been flushed from the cache. A popular low-level optimization

for a ray tracer is to trace a bundle of rays called a ray packet through adjacent

pixels. These rays likely traverse the scene data structure in a similar way, which

increases memory coherence. On a SIMD processor a single thread can trace an

entire packet simultaneously. However, packet tracing suffers from computational

coherence problems. Sometimes different rays in the same packet progress to dif-

ferent parts of the scene data structure or branch different ways in the ray intersec-

tion test. In these cases, processing multiple rays simultaneously on a thread gives

no advantage because memory coherence is lost or both sides of the branch must

be taken. As a result, fast ray tracers are often designed to trace packets through

very sophisticated data structures. They are typically limited not by computation

but by memory performance problems arising from resultant cache inefficiency.

Because frame buffer storage per pixel is often much smaller than scene struc-

ture per triangle, the rasterizer has an inherent memory performance advantage

over the ray tracer. A rasterizer reads each triangle into memory and then pro-

cesses it to completion, iterating over many pixels. Those pixels must be adjacent

to each other in space. For a row-major image, if we iterate along rows, then

the pixels covered by the triangle are also adjacent in memory and we will have

excellent coherence and fairly low memory bandwidth in the inner loop. Further-

more, we can process multiple adjacent pixels, either horizontally or vertically,

simultaneously on a SIMD architecture. These will be highly memory and branch

coherent because we're stepping along a single triangle. There are many variations

on ray casting and rasterization that improve their asymptotic behavior. However,

these algorithms have historically been applied to only millions of triangles and

pixels. At those sizes, constant factors like coherence still drive the performance

of the algorithms, and rasterization's superior coherence properties have made it

preferred for high-performance rendering. The cost of this coherence is that after

even the few optimizations needed to get real-time performance from a raster-

izer, the code becomes so littered with bit-manipulation tricks and highly derived

terms that the elegance of a simple ray cast seems very attractive from a software

engineering perspective. This difference is only magnified when we make the ren-

dering algorithm more sophisticated. The conventional wisdom is that ray-tracing

algorithms are elegant and easy to extend but are hard to optimize, and rasteri-

zation algorithms are very efficient but are awkward and hard to augment with

new features. Of course, one can always make a ray tracer fast and ugly (which

packet tracing succeeds at admirably) and a rasterizer extensible but slow (e.g.,

Pixar's RenderMan, which was used extensively in film rendering over the past

two decades).

15.8.3 Early-Depth-Test Example

One simple optimization that can significantly improve performance, yet only

minimally affects clarity, is an early depth test. Both the rasterizer and the ray-

tracer structures sometimes shaded a point, only to later find that some other point

was closer to the surface. As an optimization, we might first find the closest point

before doing any shading, then go back and shade the point that was closest. In ray

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home