Object-Order Ray Tracing for Fully Dynamic Scenes - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

our traditional deferred shading pipeline. We reconstruct reflecting surface points

using the depth and the normal stored in the G-buffer and reflect the camera rays

to spawn reflection rays.

2.4.6 Shading

We use standard deferred shading to process the closest hit point information

collected for each ray. The result is composited with the shading of the point

that spawned the ray, blended using Schlick's fresnel approximation.

We implemented sunlight for demonstration purposes. For other, spatially

bounded light sources, it would make sense to limit the influence of these lights

to a certain subset of grid cells. Shading can then be restricted to few lights per

cell just like in screen-space deferred shading approaches.

2.5 Results

We have implemented the described algorithm with DirectX 11 Graphics and

Compute, and we use the radix sorting implementation provided by the B40C

library implemented in CUDA 5 [Merrill and Grimshaw 11]. The algorithm is

implemented in a typical rasterization-based deferred shading engine, using ras-

terization for primary and shadow rays and using the described method for tracing

reflection rays. All tests were run on a NVIDIA GeForce GTX 560 GPU and an

Intel Core i7-2600K CPU. Images were rendered at a resolution of 1280

×

720.

The ray and voxel grids had a resolution of 128

128 each.

We tested our method in the Crytek Sponza scene (270k tris), the Sibenik

Cathedral scene (75k tris), and a simple test scene composed of a bumpy reflective

cube and some low-poly objects (2k tris). The Sponza scene generates highly

incoherent reflection rays due to the heavy use of normal maps on the scene's

brickwork. Reflection rays were spawned everywhere except on the curtains and

the sky.

To increase the data locality of spacially adjacent geometry, we have used

the mesh optimization post-processing step provided by the Open Asset Import

Library [Gessler et al. 09]. All quantities were measured for a selection of view

points that vary in complexity by both ray coherence and surrounding geometry.

Table 2.1 shows the average time spent per frame in each of the major stages.

With less than 2 ms, conservative voxelization is rather cheap. Naive ray march-

ing and ray sorting are pretty much independent of the scene complexity, con-

stantly taking about 10 ms each. Intersection testing, on the other hand, is highly

sensitive to the number of triangles, as all triangles are processed linearly.

Table 2.2 details the time spent in the intersection testing stage. Together,

stream out (GS) and emission of cell-triangle pairs (PS) in pass 1 make for a

cross-shader communication overhead of about 10 ms. This can likely be reduced

in future graphics pipelines (see Section 2.6). Intersection testing itself (pass 2)

is compute-bound. Due to our carefully chosen memory layout, this is to be

×

128

×

GPU Pro: Advanced Rendering Techniques

Search WWH ::

Custom Search

Home