Graphics Reference
In-Depth Information
4. SV_GroupIndex
This uint gives the flattened index into the current group. For a 16x16 area, this
value will be between 0 and 255. For the purpose of this algorithm, it is essen-
tially the thread ID, used only to coordinate work across the group.
The final piece in the puzzle is the ability for threads to communicate with each
other. This is done through a 4-KB chunk of group shared memory, and synchronization
intrinsics. Variables defined at the global scope, such as those shown in Listing 9.11 with
the groupshared prefix, can be both read from and written to by all threads in the current
group.
groupshared float
groupResults[16 * 16];
groupshared
float4
plane;
groupshared
float3
rawNormals[2][2];
groupshared float3
corners[2][2];
Listing 9.1 1. Compute shader state declarations.
Synchronization is done through a choice of six barrier functions. The code can be
authored with either a *MemoryBarrier() or *MemoryBarrierWithGroupSync() call.
The former blocks until memory operations have finished, but progress can continue
before remaining ALU instructions complete. The latter blocks until all threads in the
group have reached the specified point—both memory and arithmetic instructions must
be complete. The barrier can either be All, Device, or Group—, with decreasing scope
at each level. Thus, an AllMemoryBarrierWithGroupSync() is the heaviest intrinsic to
employ, whereas GroupMemoryBarrier() is more lightweight. In this algorithm, only
GroupMemoryBarrierWithGroupSync() is used. Figure 9.17 shows the first phase of the
algorithm.
Figure 9.1 7. Compute shader phase one.
Search WWH ::




Custom Search