Graphics Reference
In-Depth Information
There are of course more operations for which one might wish to provide an
abstracted interface. These include per-object and per-mesh transformations, tes-
sellation of curved patches into triangles, and per-triangle operations like silhou-
ette detection or surface extrusion. Various APIs offer abstractions of these within
a programming model similar to vertex and pixel shaders.
Chapter 38 discusses how GPUs are designed to execute this pipeline effi-
ciently. Also refer to your API manual for a discussion of the additional stages
(e.g., tessellate, geometry) that may be available.
15.7.2 Interface
The interface to a software rasterization API can be very simple. Because a soft-
ware rasterizer uses the same memory space and execution model as the host pro-
gram, one can pass the scene as a pointer and the callbacks as function pointers or
classes with virtual methods. Rather than individual triangles, it is convenient to
pass whole meshes to a software rasterizer to decrease the per-triangle overhead.
For a hardware rasterization API, the host machine (i.e., CPU) and graphics
device (i.e., GPU) may have separate memory spaces and execution models. In
this case, shared memory and function pointers no longer suffice. Hardware ras-
terization APIs therefore must impose an explicit memory boundary and narrow
entry points for negotiating it. (This is also true of the fallback and reference soft-
ware implementations of those APIs, such as Mesa and DXRefRast.) Such an API
requires the following entry points, which are detailed in subsequent subsections.
1. Allocate device memory.
2. Copy data between host and device memory.
3. Free device memory.
4. Load (and compile) a shading program from source.
5. Configure the output merger and other fixed-function state.
6. Bind a shading program and set its arguments.
7. Launch a draw call, a set of device threads to render a triangle list.
15.7.2.1 Memory Principles
The memory management routines are conceptually straightforward. They
correspond to malloc , memcpy , and free , and they are typically applied to large
arrays, such as an array of vertex data. They are complicated by the details neces-
sary to achieve high performance for the case where data must be transferred per
rendered frame, rather than once per scene. This occurs when streaming geome-
try for a scene that is too large for the device memory; for example, in a world
large enough that the viewer can only ever observe a small fraction at a time. It
also occurs when a data stream from another device, such as a camera, is an input
to the rendering algorithm. Furthermore, hybrid software-hardware rendering and
physics algorithms perform some processing on each of the host and device and
must communicate each frame.
One complicating factor for memory transfer is that it is often desirable to
adjust the data layout and precision of arrays during the transfer. The data struc-
ture for 2D buffers such as images and depth buffers on the host often resembles
the “linear,” row-major ordering that we have used in this chapter. On a graph-
ics processor, 2D buffers are often wrapped along Hilbert or Z-shaped (Morton)
 
 
Search WWH ::




Custom Search