Graphics Reference
In-Depth Information
and use the memory, and hence it also must synchronize access to that memory. This will
depend on the algorithm being implemented, but it typically involves using the thread ad-
dressing mechanisms described in the previous sections to ensure that access to the GSM
is performed safely, without the need for atomic functions.
While group shared memory provides a fast and efficient means for threads within a
thread group to share information, it does come with some limitations. Since it is limited
to 32 KB for each thread group, in any situation where a larger shared pool of memory is
needed it is not sufficient. In addition, sharing information is confined to within a single
thread group. If an algorithm calls for a large number of threads to all have access to the
shared memory pool, group shared memory is not an option.
The group shared memory joins the register-based memory of the programmable
shader cores and the larger resource-based memory that can be bound to the pipeline.
These three types of memory provide a variety of access speeds and available sizes, which
can be used in different situations that match their abilities. Register-based memory is the
fastest to access, but it has the smallest amount of memory available. Device memory re-
sources provide gigantic available memory sizes, but also exhibit the slowest access times.
Group shared memory strikes a balance between the other two. It is faster to access than
resource memory and provides a larger available memory size than register-based memory.
This is a major increase in flexibility for the compute shader. With a memory area
that is accessible to multiple threads, it is possible for threads to share information with one
another. It also increases the potential for improved efficiency of a thread group as a whole.
For example, texture accesses needed by more than one thread could be loaded by one
thread and then shared with all other threads in the thread group. This would effectively
lower the overall number of texture accesses for the thread group, and would thus improve
the overall efficiency of the algorithm. In this way GSM can be used as a customized mem-
ory cache that can be directly controlled by the shader program. Sharing between threads
is not limited to simple texture accesses, either. It is also possible to share intermediate
calculations between threads, which would otherwise need to be performed individually.
We will see examples of both of these optimizations in the second half of this topic.
5.4 Thread Synchronization
With a large number of threads operating simultaneously, and with the ability for threads
to interact with one another through either the group shared memory or through unordered
access views of resources, there is clearly a need to be able to synchronize memory access
between threads. As with traditional multithreaded programming, where many threads can
read and write the same memory locations, there is a potential for memory corruption due
to read-after-write hazards (additional details about multithreading programming on the
Search WWH ::




Custom Search