Dynamic Tessellation - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

4. SV_GroupIndex

This uint gives the flattened index into the current group. For a 16x16 area, this

value will be between 0 and 255. For the purpose of this algorithm, it is essen-

tially the thread ID, used only to coordinate work across the group.

The final piece in the puzzle is the ability for threads to communicate with each

other. This is done through a 4-KB chunk of group shared memory, and synchronization

intrinsics. Variables defined at the global scope, such as those shown in Listing 9.11 with

the groupshared prefix, can be both read from and written to by all threads in the current

group.

groupshared float

groupResults[16 * 16];

groupshared

float4

plane;

groupshared

float3

rawNormals[2][2];

groupshared float3

corners[2][2];

Listing 9.1 1. Compute shader state declarations.

Synchronization is done through a choice of six barrier functions. The code can be

authored with either a *MemoryBarrier() or *MemoryBarrierWithGroupSync() call.

The former blocks until memory operations have finished, but progress can continue

before remaining ALU instructions complete. The latter blocks until all threads in the

group have reached the specified point—both memory and arithmetic instructions must

be complete. The barrier can either be All, Device, or Group—, with decreasing scope

at each level. Thus, an AllMemoryBarrierWithGroupSync() is the heaviest intrinsic to

employ, whereas GroupMemoryBarrier() is more lightweight. In this algorithm, only

GroupMemoryBarrierWithGroupSync() is used. Figure 9.17 shows the first phase of the

algorithm.

Figure 9.1 7. Compute shader phase one.

Search WWH ::

Custom Search

Home