The Computation Pipeline - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

to utilize constant buffers in the same manner we have seen in the rendering pipeline.

Constant buffers provide read-only access to the data stored in them.

Among all the different types of resources that these access mechanisms can attach

to, there are literally gigabytes of storage available for shader programs to use. However,

to accommodate this large amount of memory, it must be stored in memory located outside

of the GPU itself. It is currently not feasible to store such large amounts of data within a

processor, so there is generally off-board memory modules used. This memory is normally

accessed with a very high bandwidth connection, but there is also a relatively high latency

between the time a value is requested and when it is returned. Because of this, device

memory resources are considerably slower than register-based memory. While an unor-

dered access view can be used to implement the same operations in device memory as in

register-based memory, there would be significant performance penalties when frequent

read and write operations are performed.

Another consideration for device memory resources is that access to these resources

is provided to all threads that are executing the current shader program. This means that if

there are 6400 threads (as we saw in our initial dispatch example), each of those threads

can read or write to any location within a resource through an unordered access view.

Naturally, this requires manual synchronization of access to the resource, either by us-

ing atomic operations or by defining an access paradigm that can adequately ensure that

threads will not overwrite each other's desired data ranges.

5.3.3 Group Shared Memory

As discussed in Chapter 3, "The Rendering Pipeline," all of the programmable shader

stages in the rendering pipeline are kernel based. Each instance of the rendering pipeline

kernels is performed in complete isolation from one another. The compute shader breaks

free from this paradigm and allows the use of a shared memory area that can be accessed

simultaneously by more than one thread. In fact, every thread in a complete thread group is

allowed to access the same memory area. This gives the shared memory its name— group

shared memory (GSM). The group shared memory is limited to 32 KB for each thread

group, but it is accessible to all of the threads in the thread group. It is intended to reside

on the GPU processor die, which allows for much faster access than the device memory

resources.

The group shared memory is declared in the global scope of the compute shader with

a special storage scope identifier called groupshared. This memory area can be declared

either as an array of basic data types, or as structures of more complex data arrangements.

Once a thread group is instantiated, the group shared memory is available to all of its

threads simultaneously. Since the entire group shared memory is available to all threads in

the group, the compute shader program must determine how the threads will interact with

Search WWH ::

Custom Search

Home