Per-Pixel Lists for Single Pass A-Buffer - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

1.7.1 3D Scene Rendering

We developed a first application for rendering transparent, textured scenes. It is

included in the companion source code ( bin/seethrough.exe ). Figure 1.3 shows a

3D rendering of a large scene with textures and transparency. It gives the timings

breakout for each pass and each technique, as well as their memory cost.

1.7.2 Benchmarking

For benchmarking we developed an application rendering transparent, front fac-

ing quads in orthographic projection. The position and depth of the quads are

randomized and change every frame. All measures are averaged over six sec-

onds of running time. We control the size and number of quads, as well as their

opacity. We use the ARB_timer_query extension to measure the time to render a

frame. This includes the Clear, Build,andRender passesaswellaschecking

for the main buffer overflow. All tests are performed on a GeForce GTX480 and

a GeForce Titan using drivers 320.49. We expect these performance numbers

to change with future driver revisions due to issues mentioned in Section 1.6.

Nevertheless, our current implementation exhibits performance levels consistent

across all techniques as well as between Fermi and Kepler.

The benchmarking framework is included in the companion source code ( bin/

benchmark.exe ). The python script runall.py launches all benchmarks.

Number of fragments. For a fixed depth complexity, the per-frame time is ex-

pected to be linear in the number of fragments. This is verified by all imple-

mentations as illustrated Figure 1.5. We measure this by rendering a number of

quads perfectly aligned on top of each other, in randomized depth order. The

number of quads controls the depth complexity. We adjust the size of the quads

to vary the number of fragments only.

Depth complexity. In this experiment we compare the overall performance for a

fixed number of fragments but a varying depth complexity. As the size of the per-

pixel lists increases, we expect a quadratic increase in frame rendering time. This

is verified Figure 1.6. The technique Pre-Open is the most severely impacted

by the increase in depth complexity. The main reason is that the sort occurs in

global memory, and each added fragment leads to a full traversal of the list via

the eviction mechanism.

Early culling. In scenes with a mix of transparent and opaque objects, early culling

fortunately limits the depth complexity per pixel. The techniques Pre-Open and

Pre-Lin both afford for early culling (see Section 1.4.2). Figure 1.7 demonstrates

the benefit of early culling. The threshold is set up to ignore all fragments after

an opacity of 0.95 is reached (1 being fully opaque).

Search WWH ::

Custom Search

Home