Graphics Reference
In-Depth Information
CPU can be found in Chapter 7, "Multithreaded Rendering"). How can such a massive
number of threads be efficiently synchronized without losing the performance that the
GPU's parallelism provides? Fortunately, several different mechanisms are available for
synchronizing the threads of a thread group. We will explore each of these possibilities in
the following sections.
5.4.1 Memory Barriers
We will first look at the highest-level synchronization techniques, referred to as memory
barriers. HLSL provides a number of intrinsic functions that can be used to synchronize
memory accesses across all threads in a thread group. It is important to note that this is an
access mechanism that synchronizes only the threads within a thread group, and not across
an entire dispatch. These functions have two properties that differentiate them from one
another. The first is the class of memory that the threads are synchronizing across when the
function is called. It is possible to synchronize access to the group shared memory, device
memory, or both. The second property specifies whether all of the threads in a given thread
group are synchronized to the same point within their execution. These two properties pro-
vide a range of different synchronization behaviors for the developer to choose from. The
different versions of these intrinsic functions are listed in Table 5.1 below.
Without Group Synchronization
Without Group Synchronization
With Group Synchronization
With Group Synchronization
GroupMemoryBarrierQ
GroupMemoryBarrier()
GroupMemoryBarrierWithGroupSync()
GroupMemoryBarrierWithGroupSync()
DeviceMemoryBarrier()
DeviceMemoryBarrier()
DeviceMemoryBarrierWithGroupSync()
DeviceMemoryBarrierWithGroupSyncQ
AllMemoryBarrier()
AHMemoryBarrierWithGroupSync()
AHMemoryBarrierWithGroupSync()
AllMemoryBarrier()
Table 5.1. Intrinsic Functions: without and with group synchronization.
Each of these functions will block a thread from continuing until that function's par-
ticular conditions have been met. The first function, GroupMemoryBarrier(), blocks a
thread's execution until all writes to the group shared memory from all threads in a thread
group have been completed. This is used to ensure that when threads share data with one
another in the group shared memory that the desired values have had a chance to be written
into the group shared memory before being read by other threads. There is an important
distinction here between the shader core executing a write instruction, and that instruction
actually being carried out by the GPU's memory system and being written to memory,
where it would then be available again to other threads. Depending on the hardware imple-
mentation, there can be a variable amount of time between writing a value and when it
actually ends up at its destination. By performing a blocking operation until these writes
Search WWH ::




Custom Search