The Computation Pipeline - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

computational power is available is trivial. This is even more true for GPGPU applications,

in which the user typically processes as much data as possible, as fast as possible. If an

algorithm is implemented in the compute shader, it can easily scale to the current system's

capabilities.

With such a wide variety of benefits, and tight integration with Direct3D 11,

DirectCompute represents the first steps toward having a fully general-purpose coproces-

sor. Throughout the rest of this chapter, we will take a detailed look at how this technology

works and how it can be used.

5.1.2 The Compute Shader Stage

The compute shader follows the same general usage concept as the other programmable

shader stages. A shader program is compiled and then used to create a shader object through

the device interface, which can then be loaded into the compute shader stage through the

device context interface. The stage can take advantage of the same set of resources, con-

stant buffers, and samplers that we have seen in Chapter 3 for the other rendering pipeline

stages, with the additional capability to bind resources to the stage with unordered access

views. However, the compute shader is fundamentally different than the other program-

mable pipeline stages, since it doesn't explicitly have an input or output that is passed from

a previous stage or passed to the next stage. It can receive some system value semantics

(which will be discussed shortly) as input arguments, but this is the only attribute -type data

that can be use in a shader program. All of the remaining data input and output is performed

through resources instead.

This arrangement indicates that the compute shader implements complete algorithms

within a single program, as opposed to the rendering pipeline, which has the option to im-

plement an algorithm over many different pipeline stages. Of course, the compute shader

can be used to iteratively implement an algorithm in steps, but the choice of how best to

design the algorithm is left to the developer. This is an interesting design, and it allows for

algorithms to be composed in a different way than was possible prior to the introduction

of the compute shader. We will further explore the details about how to use the compute

shader from the API perspective later in the chapter.

5.2 DirectCompute Threading Model

We begin our journey through DirectCompute by examining its threading and execution

model. We have already noted that the GPU is very good at processing parallel algorithms

due to its large number of processing cores. With so many processors available to perform

Search WWH ::

Custom Search

Home