Graphics Reference
In-Depth Information
1.3 Lists with Open Addressing ( Open-alloc )
In the previous section, a cell was allocated by incrementing a global counter,
and each cell in a list had to store the index of the next cell in that list. This is
the traditional linked-list data structure.
In this section, we describe a different way to allocate cells in the main buffer
and traverse the list of fragments associated with a given pixel. This technique
frees us from storing the index of the next cell, allowing more fragments to fit in
the same amount of memory. It does come with some disadvantages as well, in
particular the inability to store more that 32 bits of data per fragment.
We start with a general introduction of this allocation strategy and then
introduce the two techniques based on it, Post-Open and Pre-Open.
1.3.1 Insertion
For each pixel p , we fix a sequence of cell positions in the main buffer, ( h ( p,i )) i≥ 1
and call it a probe sequence . The function h is defined as
h ( p,i )= p + o i mod H,
or, in C speak, (p + offsets[i]) % gBufferSize .
where H = gBufferSize is the size of the main buffer. The sequence ( o i ) i≥ 1
should ideally be a random permutation of the set of integers [0 ..H
1], so that
the probe sequence ( h ( p,i )) i≥ 1 covers all the cells of the main buffer. We call
( o i ) i≥ 1 the sequence of offsets . In practice this sequence is represented with a
fixed-length array of random integers, which we regenerate before each frame.
The fragments associated with pixel p are stored in the main buffer at locations
indicated by the probe sequence. When a fragment covering pixel p is stored
at position h ( p,i ), we say that it has age i ,orthat i is the age of this stored
fragment.
There are two interesting consequences to using the probe sequence defined
by function h . First, note that the sequence of offsets is independent of the pixel
position p . This means that the probe sequence for pixel q is a translation of the
probe sequence for pixel p by the vector q
p . During the rasterization, neigh-
boring threads handle neighboring pixels and in turn access neighboring memory
locations as each is traversing the probe sequence of its corresponding pixel. This
coherence in the memory access pattern eases the stress of the GPU memory
bus and increases memory bandwidth utilization. It was already exploited by
Garcıa et al. for fast spatial hashing [Garcıa et al. 11].
Second, assuming that H is greater than the total number of screen pixels,
then the function h becomes invertible in the sense that knowing h ( p,i )and i is
Search WWH ::




Custom Search