Graphics Reference
In-Depth Information
structure variable Output of type f2p . In the body of main the output depth
value is copied from the input depth value, while the output color is set to the
attenuated input color. Note that HLSL defines multiplication of a vector value
( Input.Color ) by a scalar value ( brightness ) in the mathematical sense: The
result is a vector of the same dimension, with each component multiplied by the
scalar value.
Beginning with Direct3D 10 (2006) HLSL is the only way that programmers
can specify a vertex, primitive, or fragment shader using Direct3D. Earlier ver-
sions of Direct3D included assembly-language-like interfaces that were depre-
cated in Direct3D 9 and are unavailable in Direct3D 10. Thus, HLSL is the portion
of the Direct3D architecture through which programmers specify the operation of
a shader. While it is not the intention of this section to provide a coding tutorial
for Direct3D shaders, this simple example includes many of the key ideas.
A useful metaphor for the operation of this simple shader is that of a heater
operating on a stream of flowing water. Pixel fragments arrive in an ordered
sequence, like water flowing through a pipe. A simple operation is applied to each
pixel fragment, just as the heater warms each unit of water. Finally, the pixel frag-
ments are sent along for further processing, just as water flows out of the heater
through the exit pipe. Indeed, this metaphor is so apt that GPU processing is often
referred to as stream processing. That's how we treat GPU programmability in
this section, although the following section (Section 38.6) will consider an impor-
tant exception with significant implications for GPU implementation.
Writing highly parallel code on a general-purpose CPU is difficult—only a
small subset of computer programmers is thought to be able to achieve reli-
able operation and scalable performance of parallel code. 6 Yet writing shaders
is a straightforward task: Even novice programmers achieve correct and high-
performance results. The difference in difficulty is the result of differences
between the general-purpose architectures of CPUs and the special-purpose
architectures of GPUs. The Single Program Multiple Data (SPMD) abstraction
employed by GPUs allows shader writers to think of only a single vertex, primi-
tive, or fragment while they code—the implementation takes care of all the details
required to execute the shaders in the correct order, on the correct data, while effi-
ciently utilizing the data-parallel circuitry. CPU programmers, on the other hand,
must carefully consider parallelism while they code, because their code specifies
the details of any thread-level parallelism that is to be supported.
While GPU programming is experienced as a recent and exciting development,
the ability to program GPUs is as old as GPUs themselves. The Ikonas graphics
system [Eng86], introduced in 1978, exposed a fully programmable architecture
to application developers. Ten years later Trancept introduced the TAAC-1 graph-
ics processor, which included a C-language microcode compiler to simplify the
development of application-specific code. The architectures of these early GPUs
were akin to extended CPU architectures; they bore little similarity to the more
specialized pipeline architecture that is the subject of this chapter, and that has
coevolved with mainstream GPU implementations since the early 1980s. Unlike
these CPU-like architectures, the pipeline architecture was without support for
application-specified programs (shaders) until both OpenGL and Direct3D were
6. It's easy to write code that exploits the circuit-level parallelism of the CPU, and oper-
ating systems make it easy to exploit virtual parallelism by concurrently executing
multiple programs. The parallelism that is difficult to exploit is at the intermediate
level: multiple-thread task parallelism within a single program.
Search WWH ::




Custom Search