Graphics Reference
In-Depth Information
computational power is available is trivial. This is even more true for GPGPU applications,
in which the user typically processes as much data as possible, as fast as possible. If an
algorithm is implemented in the compute shader, it can easily scale to the current system's
capabilities.
With such a wide variety of benefits, and tight integration with Direct3D 11,
DirectCompute represents the first steps toward having a fully general-purpose coproces-
sor. Throughout the rest of this chapter, we will take a detailed look at how this technology
works and how it can be used.
5.1.2 The Compute Shader Stage
The compute shader follows the same general usage concept as the other programmable
shader stages. A shader program is compiled and then used to create a shader object through
the device interface, which can then be loaded into the compute shader stage through the
device context interface. The stage can take advantage of the same set of resources, con-
stant buffers, and samplers that we have seen in Chapter 3 for the other rendering pipeline
stages, with the additional capability to bind resources to the stage with unordered access
views. However, the compute shader is fundamentally different than the other program-
mable pipeline stages, since it doesn't explicitly have an input or output that is passed from
a previous stage or passed to the next stage. It can receive some system value semantics
(which will be discussed shortly) as input arguments, but this is the only attribute -type data
that can be use in a shader program. All of the remaining data input and output is performed
through resources instead.
This arrangement indicates that the compute shader implements complete algorithms
within a single program, as opposed to the rendering pipeline, which has the option to im-
plement an algorithm over many different pipeline stages. Of course, the compute shader
can be used to iteratively implement an algorithm in steps, but the choice of how best to
design the algorithm is left to the developer. This is an interesting design, and it allows for
algorithms to be composed in a different way than was possible prior to the introduction
of the compute shader. We will further explore the details about how to use the compute
shader from the API perspective later in the chapter.
5.2 DirectCompute Threading Model
We begin our journey through DirectCompute by examining its threading and execution
model. We have already noted that the GPU is very good at processing parallel algorithms
due to its large number of processing cores. With so many processors available to perform
Search WWH ::




Custom Search