Graphics Reference
In-Depth Information
to utilize constant buffers in the same manner we have seen in the rendering pipeline.
Constant buffers provide read-only access to the data stored in them.
Among all the different types of resources that these access mechanisms can attach
to, there are literally gigabytes of storage available for shader programs to use. However,
to accommodate this large amount of memory, it must be stored in memory located outside
of the GPU itself. It is currently not feasible to store such large amounts of data within a
processor, so there is generally off-board memory modules used. This memory is normally
accessed with a very high bandwidth connection, but there is also a relatively high latency
between the time a value is requested and when it is returned. Because of this, device
memory resources are considerably slower than register-based memory. While an unor-
dered access view can be used to implement the same operations in device memory as in
register-based memory, there would be significant performance penalties when frequent
read and write operations are performed.
Another consideration for device memory resources is that access to these resources
is provided to all threads that are executing the current shader program. This means that if
there are 6400 threads (as we saw in our initial dispatch example), each of those threads
can read or write to any location within a resource through an unordered access view.
Naturally, this requires manual synchronization of access to the resource, either by us-
ing atomic operations or by defining an access paradigm that can adequately ensure that
threads will not overwrite each other's desired data ranges.
5.3.3 Group Shared Memory
As discussed in Chapter 3, "The Rendering Pipeline," all of the programmable shader
stages in the rendering pipeline are kernel based. Each instance of the rendering pipeline
kernels is performed in complete isolation from one another. The compute shader breaks
free from this paradigm and allows the use of a shared memory area that can be accessed
simultaneously by more than one thread. In fact, every thread in a complete thread group is
allowed to access the same memory area. This gives the shared memory its name— group
shared memory (GSM). The group shared memory is limited to 32 KB for each thread
group, but it is accessible to all of the threads in the thread group. It is intended to reside
on the GPU processor die, which allows for much faster access than the device memory
resources.
The group shared memory is declared in the global scope of the compute shader with
a special storage scope identifier called groupshared. This memory area can be declared
either as an array of basic data types, or as structures of more complex data arrangements.
Once a thread group is instantiated, the group shared memory is available to all of its
threads simultaneously. Since the entire group shared memory is available to all threads in
the group, the compute shader program must determine how the threads will interact with
Search WWH ::




Custom Search