Graphics Reference
In-Depth Information
4.7.3 Interleaved Spatial Coherence and Signal Reconstruction
Because most of our rays are spatially coherent, we can shoot rays every other
pixel—so-called interleaved sampling—and then apply some sort of signal recon-
struction filter from sampling theory. This works very well with rough reflections
since the result tends to have low frequency, which is ideal for signal reconstruc-
tion. This was tested on linear tracing-based algorithms, and a performance
increase of about 3
was achieved. The interleaving pattern was twice hor-
izontally, twice vertically. These improvements were also not tested on a Hi-Z
tracer, so the numbers presented later do not include these either.
×
-4
×
4.7.4 Cross-Bilateral Upsampling or Temporal Super-Sampling
Since we run the ray tracing at half-resolution, we do need a smart upsampling
scheme to make up for the low number of pixels. A cross-bilateral image up-
sampling algorithm is a perfect fit for this kind of a task [Kopf et al. 07], but a
temporal super-sampling algorithm is even better: after four frames we will have
full-resolution traced results using the temporal re-projection that was explained
earlier.
For the cross-bilateral upsampler, The full-resolution depth buffer would be
an input together with the half-resolution reflection color buffer. The algorithm
would upsample the reflection color buffer to full resolution while preserving sil-
houettes and hard edges. It's way faster and cheaper to calculate the reflections
at half-resolution than full-resolution. However, to recompose the image back to
the original screen, at full-resolution, we need to scale it up while preserving the
hard edges, and that's exactly what the Cross-Bilateral Upsampling algorithm is
good for.
While upsampling one could also use another approach and append the pixels
at depth discontinuities to an append/consume buffer and re-trace only those
pixels at high resolution later for higher quality. This was not tested.
4.8 Performance
The demo runs at half-resolution, meaning 960
×
540, and it's running super-fast:
0.35-0.39 ms on NVidia GTX TITAN,
0.70-0.80 ms on NVidia GTX 670,
0.80-0.90 ms on AMD 7950.
The timers are the Hi-Z Ray-Marching and Cone-Tracing combined.
The demo is memory latency bound, and the memory unit is 80-90% active,
which gives little to no room for our ALU units to work because they just sit
there waiting for a fetch to complete.
Search WWH ::




Custom Search