Hi-Z Screen-Space Cone-Traced Reflections - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

We could also manually take multiple samples to achieve the same result.

Basically, instead of sampling quads, we sample elongated rectangles at grazing

angles.

We saw earlier in Section 4.4.5 that for complicated BRDF models we would

need to pre-compute a 2D table of local reflection vectors and cone angles. A

texture suited for this is R16G16B16A16. The RGB channels would store the

local vector and the alpha channel would store either one isotropic cone-angle

extent or two anisotropic vertical and horizontal cone-angle extents. These two

anisotropic values for the cone would decide how many extra samples we would

take vertically to approximate an elongated rectangle to stretch the reflections.

4.7 Optimizations

4.7.1 Combining Linear and Hi-Z Traversal

One drawback of the Hierarchical-Z traversal is that it is going to traverse down

to lower hierarchy levels when the ray travels close to a surface. Evaluating the

entire Hierarchical-Z traversal algorithm for such small steps is more expensive

than doing a simple linear search with the same step size. Unfortunately the ray

starts immediately close to a surface, the surface we are reflecting the original

ray from. Doing a few steps of linear search in the beginning seems to be a great

optimization to get the ray away from the surface and then let the Hierarchical-Z

traversal algorithm do its job of taking the big steps.

In case the linear search finds intersections, we can just early-out in the shader

code with a dynamic branch and skip the entire Hi-Z traversal phase. It's also

worth it to end the Hi-Z traversal at a much earlier level such as 1 or 2 and

then continue with another linear search in the end. The ending level could be

calculated depending on the distance to the camera, since the farther away the

pixel is, the less detail it needs because of perspective, so stopping much earlier

is going to give a boost in performance.

4.7.2 Improving Fetch Latency

Partially unrolling dynamic loops to handle dependent texture fetches tends to

improve performance with fetch/latency-bound algorithms. So, instead of han-

dling one work per thread, we would actually pre-fetch the work for the next N

loops. We can do this because we have a deterministic path on our ray. However,

there is a point where pre-fetching starts to hurt performance because the reg-

ister usage rises and using more registers means less buckets of threads can run

in parallel. A good starting point is N = 4. That value was used on a regular

linear tracing algorithm and a speedup of 2

wasmeasuredonbothNVIDIA

and AMD hardware. The numbers appearing later in this chapter do not include

these improvements because it wasn't tested on a Hi-Z tracer.

×

-3

×

Search WWH ::

Custom Search

Home