Graphics Reference
In-Depth Information
In the Post-sort method, sorting happens in local memory, which is faster
but limits the maximum number of fragments associated with a pixel to a few
hundred. Allocating more local memory for sorting more fragments increases
register pressure and reduces parallelism and performance.
1.1.3 Allocation Strategies
In addition to the scheduling of the sort, we examine two strategies for allocating
cells containing fragment information in the main buffer. The first one, Lin-
alloc, stores fragments in per-pixel linked-lists and allocates fresh cells linearly
from the start of the buffer to its end. Since many allocations are done concur-
rently, the address of a fresh cell is obtained by atomically incrementing a global
counter. Additional memory is necessary to store the address of the first cell
(head) of the list of fragments of each pixel. Section 1.2 details the Lin-alloc
strategy.
The second strategy that we examine, Open-alloc, is randomized and some-
what more involved. To each pixel p we associate a pseudo-random sequence of
cell positions in the main buffer: ( h ( p,i )) i≥ 1 ,for i ranging over the integers. In
the spirit of the “open addressing” techniques for hash tables, the cells at posi-
tions h ( p,i ) are examined by increasing value of i until an empty one is found.
A non-empty cell in this sequence may store another fragment associated with
pixel p or with a different pixel q . Such a collision between fragments must be
detected and handled correctly. Section 1.3 details the Open-alloc strategy.
The combination of two allocation strategies (Lin-alloc and Open-alloc)
with two schedules for sorting (Post-sort and Pre-sort) gives us four vari-
ations for building an A-buffer: Post-Lin (Sections 1.2.1 and 1.2.2), Pre-Lin
(Section 1.2.3), Post-Open (Section 1.3.2) and Pre-Open (Section 1.3.3).
Section 1.4.1 details how fragments are sorted in local memory in the Ren-
der pass of the Post-sort method. Some memory management issues, including
buffer resizing, are addressed in Section 1.5, and information about our implemen-
tation is given in Section 1.6. In Section 1.7, we compare these four variations,
as implemented on a GeForce 480 and a GeForce Titan.
1.2 Linked Lists with Pointers ( Lin-alloc )
The first two approaches we describe construct linked lists in each pixel, allocating
data for new fragments linearly in the main buffer. A single cell contains the depth
of the fragment and the index of the next cell in the list. Since no cell is ever
removed from a list, there is no need for managing a free list: allocating a new
cell simply amounts to incrementing a global counter firstFreeCell that stores the
index of the first free cell in the buffer. The counter firstFreeCell is initialized to
zero. The increment is done atomically to guarantee that every thread allocating
new cells concurrently does obtain a unique memory address. A second array,
Search WWH ::




Custom Search