Per-Pixel Lists for Single Pass A-Buffer - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

1.3 Lists with Open Addressing ( Open-alloc )

In the previous section, a cell was allocated by incrementing a global counter,

and each cell in a list had to store the index of the next cell in that list. This is

the traditional linked-list data structure.

In this section, we describe a different way to allocate cells in the main buffer

and traverse the list of fragments associated with a given pixel. This technique

frees us from storing the index of the next cell, allowing more fragments to fit in

the same amount of memory. It does come with some disadvantages as well, in

particular the inability to store more that 32 bits of data per fragment.

We start with a general introduction of this allocation strategy and then

introduce the two techniques based on it, Post-Open and Pre-Open.

1.3.1 Insertion

For each pixel p , we fix a sequence of cell positions in the main buffer, ( h ( p,i )) i≥ 1

and call it a probe sequence . The function h is defined as

h ( p,i )= p + o i mod H,

or, in C speak, (p + offsets[i]) % gBufferSize .

where H = gBufferSize is the size of the main buffer. The sequence ( o i ) i≥ 1

should ideally be a random permutation of the set of integers [0 ..H

1], so that

the probe sequence ( h ( p,i )) i≥ 1 covers all the cells of the main buffer. We call

( o i ) i≥ 1 the sequence of offsets . In practice this sequence is represented with a

fixed-length array of random integers, which we regenerate before each frame.

The fragments associated with pixel p are stored in the main buffer at locations

indicated by the probe sequence. When a fragment covering pixel p is stored

at position h ( p,i ), we say that it has age i ,orthat i is the age of this stored

fragment.

There are two interesting consequences to using the probe sequence defined

by function h . First, note that the sequence of offsets is independent of the pixel

position p . This means that the probe sequence for pixel q is a translation of the

probe sequence for pixel p by the vector q

−

p . During the rasterization, neigh-

boring threads handle neighboring pixels and in turn access neighboring memory

locations as each is traversing the probe sequence of its corresponding pixel. This

coherence in the memory access pattern eases the stress of the GPU memory

bus and increases memory bandwidth utilization. It was already exploited by

Garcıa et al. for fast spatial hashing [Garcıa et al. 11].

Second, assuming that H is greater than the total number of screen pixels,

then the function h becomes invertible in the sense that knowing h ( p,i )and i is

−

Search WWH ::

Custom Search

Home