Hi-Z Screen-Space Cone-Traced Reflections - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

4.7.3 Interleaved Spatial Coherence and Signal Reconstruction

Because most of our rays are spatially coherent, we can shoot rays every other

pixel—so-called interleaved sampling—and then apply some sort of signal recon-

struction filter from sampling theory. This works very well with rough reflections

since the result tends to have low frequency, which is ideal for signal reconstruc-

tion. This was tested on linear tracing-based algorithms, and a performance

increase of about 3

was achieved. The interleaving pattern was twice hor-

izontally, twice vertically. These improvements were also not tested on a Hi-Z

tracer, so the numbers presented later do not include these either.

×

-4

×

4.7.4 Cross-Bilateral Upsampling or Temporal Super-Sampling

Since we run the ray tracing at half-resolution, we do need a smart upsampling

scheme to make up for the low number of pixels. A cross-bilateral image up-

sampling algorithm is a perfect fit for this kind of a task [Kopf et al. 07], but a

temporal super-sampling algorithm is even better: after four frames we will have

full-resolution traced results using the temporal re-projection that was explained

earlier.

For the cross-bilateral upsampler, The full-resolution depth buffer would be

an input together with the half-resolution reflection color buffer. The algorithm

would upsample the reflection color buffer to full resolution while preserving sil-

houettes and hard edges. It's way faster and cheaper to calculate the reflections

at half-resolution than full-resolution. However, to recompose the image back to

the original screen, at full-resolution, we need to scale it up while preserving the

hard edges, and that's exactly what the Cross-Bilateral Upsampling algorithm is

good for.

While upsampling one could also use another approach and append the pixels

at depth discontinuities to an append/consume buffer and re-trace only those

pixels at high resolution later for higher quality. This was not tested.

4.8 Performance

The demo runs at half-resolution, meaning 960

×

540, and it's running super-fast:

•

0.35-0.39 ms on NVidia GTX TITAN,

•

0.70-0.80 ms on NVidia GTX 670,

•

0.80-0.90 ms on AMD 7950.

The timers are the Hi-Z Ray-Marching and Cone-Tracing combined.

The demo is memory latency bound, and the memory unit is 80-90% active,

which gives little to no room for our ALU units to work because they just sit

there waiting for a fetch to complete.

GPU Pro: Advanced Rendering Techniques

Search WWH ::

Custom Search

Home