Graphics Reference
In-Depth Information
our traditional deferred shading pipeline. We reconstruct reflecting surface points
using the depth and the normal stored in the G-buffer and reflect the camera rays
to spawn reflection rays.
2.4.6 Shading
We use standard deferred shading to process the closest hit point information
collected for each ray. The result is composited with the shading of the point
that spawned the ray, blended using Schlick's fresnel approximation.
We implemented sunlight for demonstration purposes. For other, spatially
bounded light sources, it would make sense to limit the influence of these lights
to a certain subset of grid cells. Shading can then be restricted to few lights per
cell just like in screen-space deferred shading approaches.
2.5 Results
We have implemented the described algorithm with DirectX 11 Graphics and
Compute, and we use the radix sorting implementation provided by the B40C
library implemented in CUDA 5 [Merrill and Grimshaw 11]. The algorithm is
implemented in a typical rasterization-based deferred shading engine, using ras-
terization for primary and shadow rays and using the described method for tracing
reflection rays. All tests were run on a NVIDIA GeForce GTX 560 GPU and an
Intel Core i7-2600K CPU. Images were rendered at a resolution of 1280
×
720.
The ray and voxel grids had a resolution of 128
128 each.
We tested our method in the Crytek Sponza scene (270k tris), the Sibenik
Cathedral scene (75k tris), and a simple test scene composed of a bumpy reflective
cube and some low-poly objects (2k tris). The Sponza scene generates highly
incoherent reflection rays due to the heavy use of normal maps on the scene's
brickwork. Reflection rays were spawned everywhere except on the curtains and
the sky.
To increase the data locality of spacially adjacent geometry, we have used
the mesh optimization post-processing step provided by the Open Asset Import
Library [Gessler et al. 09]. All quantities were measured for a selection of view
points that vary in complexity by both ray coherence and surrounding geometry.
Table 2.1 shows the average time spent per frame in each of the major stages.
With less than 2 ms, conservative voxelization is rather cheap. Naive ray march-
ing and ray sorting are pretty much independent of the scene complexity, con-
stantly taking about 10 ms each. Intersection testing, on the other hand, is highly
sensitive to the number of triangles, as all triangles are processed linearly.
Table 2.2 details the time spent in the intersection testing stage. Together,
stream out (GS) and emission of cell-triangle pairs (PS) in pass 1 make for a
cross-shader communication overhead of about 10 ms. This can likely be reduced
in future graphics pipelines (see Section 2.6). Intersection testing itself (pass 2)
is compute-bound. Due to our carefully chosen memory layout, this is to be
×
128
×
Search WWH ::




Custom Search