Per-Pixel Lists for Single Pass A-Buffer - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

In the Post-sort method, sorting happens in local memory, which is faster

but limits the maximum number of fragments associated with a pixel to a few

hundred. Allocating more local memory for sorting more fragments increases

register pressure and reduces parallelism and performance.

1.1.3 Allocation Strategies

In addition to the scheduling of the sort, we examine two strategies for allocating

cells containing fragment information in the main buffer. The first one, Lin-

alloc, stores fragments in per-pixel linked-lists and allocates fresh cells linearly

from the start of the buffer to its end. Since many allocations are done concur-

rently, the address of a fresh cell is obtained by atomically incrementing a global

counter. Additional memory is necessary to store the address of the first cell

(head) of the list of fragments of each pixel. Section 1.2 details the Lin-alloc

strategy.

The second strategy that we examine, Open-alloc, is randomized and some-

what more involved. To each pixel p we associate a pseudo-random sequence of

cell positions in the main buffer: ( h ( p,i )) i≥ 1 ,for i ranging over the integers. In

the spirit of the “open addressing” techniques for hash tables, the cells at posi-

tions h ( p,i ) are examined by increasing value of i until an empty one is found.

A non-empty cell in this sequence may store another fragment associated with

pixel p or with a different pixel q . Such a collision between fragments must be

detected and handled correctly. Section 1.3 details the Open-alloc strategy.

The combination of two allocation strategies (Lin-alloc and Open-alloc)

with two schedules for sorting (Post-sort and Pre-sort) gives us four vari-

ations for building an A-buffer: Post-Lin (Sections 1.2.1 and 1.2.2), Pre-Lin

(Section 1.2.3), Post-Open (Section 1.3.2) and Pre-Open (Section 1.3.3).

Section 1.4.1 details how fragments are sorted in local memory in the Ren-

der pass of the Post-sort method. Some memory management issues, including

buffer resizing, are addressed in Section 1.5, and information about our implemen-

tation is given in Section 1.6. In Section 1.7, we compare these four variations,

as implemented on a GeForce 480 and a GeForce Titan.

1.2 Linked Lists with Pointers ( Lin-alloc )

The first two approaches we describe construct linked lists in each pixel, allocating

data for new fragments linearly in the main buffer. A single cell contains the depth

of the fragment and the index of the next cell in the list. Since no cell is ever

removed from a list, there is no need for managing a free list: allocating a new

cell simply amounts to incrementing a global counter firstFreeCell that stores the

index of the first free cell in the buffer. The counter firstFreeCell is initialized to

zero. The increment is done atomically to guarantee that every thread allocating

new cells concurrently does obtain a unique memory address. A second array,

Search WWH ::

Custom Search

Home