The Computation Pipeline - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

and use the memory, and hence it also must synchronize access to that memory. This will

depend on the algorithm being implemented, but it typically involves using the thread ad-

dressing mechanisms described in the previous sections to ensure that access to the GSM

is performed safely, without the need for atomic functions.

While group shared memory provides a fast and efficient means for threads within a

thread group to share information, it does come with some limitations. Since it is limited

to 32 KB for each thread group, in any situation where a larger shared pool of memory is

needed it is not sufficient. In addition, sharing information is confined to within a single

thread group. If an algorithm calls for a large number of threads to all have access to the

shared memory pool, group shared memory is not an option.

The group shared memory joins the register-based memory of the programmable

shader cores and the larger resource-based memory that can be bound to the pipeline.

These three types of memory provide a variety of access speeds and available sizes, which

can be used in different situations that match their abilities. Register-based memory is the

fastest to access, but it has the smallest amount of memory available. Device memory re-

sources provide gigantic available memory sizes, but also exhibit the slowest access times.

Group shared memory strikes a balance between the other two. It is faster to access than

resource memory and provides a larger available memory size than register-based memory.

This is a major increase in flexibility for the compute shader. With a memory area

that is accessible to multiple threads, it is possible for threads to share information with one

another. It also increases the potential for improved efficiency of a thread group as a whole.

For example, texture accesses needed by more than one thread could be loaded by one

thread and then shared with all other threads in the thread group. This would effectively

lower the overall number of texture accesses for the thread group, and would thus improve

the overall efficiency of the algorithm. In this way GSM can be used as a customized mem-

ory cache that can be directly controlled by the shader program. Sharing between threads

is not limited to simple texture accesses, either. It is also possible to share intermediate

calculations between threads, which would otherwise need to be performed individually.

We will see examples of both of these optimizations in the second half of this topic.

5.4 Thread Synchronization

With a large number of threads operating simultaneously, and with the ability for threads

to interact with one another through either the group shared memory or through unordered

access views of resources, there is clearly a need to be able to synchronize memory access

between threads. As with traditional multithreaded programming, where many threads can

read and write the same memory locations, there is a potential for memory corruption due

to read-after-write hazards (additional details about multithreading programming on the

Search WWH ::

Custom Search

Home