Graphics Reference
In-Depth Information
How it works…
We began by preparing the input and output resources, using an SRV as the input and a
UAV as the output. By creating the target Texture2D , using the source image's texture
description, we simply have to change the desc.BindFlags method to BindFlags.
UnorderedAccess , and we are ready to create the texture and UAV.
After assigning the shader resources and compiling the compute shader, we dispatch a
number of thread groups relative to the source image size. We use the Ceiling method to
ensure that if the image dimensions are not evenly divisible by the thread group size, we are
still covering the entire image (anything sampled outside the bounds of the SRV will be black
and anything written will be discarded):
// 640x480 -> Dispatch(40, 120, 1);
context.Dispatch((int)Math.Ceiling(desc.Width / 16.0),
(int)Math.Ceiling(desc.Height / 4.0), 1);
In our shader code, we have defined the input texture as we would for any SRV; it uses the first
texture slot ( t0 ). Our output resource is assigned to the first UAV slot ( u0 ).
We have defined the number of threads per thread group to be 64 by specifying the
[numthreads(16,4,1)] attribute. This size was defined by our shader macros THREADSX
and THREADSY . The minimum recommended thread group size on AMD hardware is 64
and on NVidia hardware is 32. Generally, the thread group size should be a multiple of
this minimum size. The maximum size supported by Shader Model 5 is 1024. The optimal
thread group size varies depending on the hardware and what tasks the compute shader is
performing. During performance tests, the optimal thread group sizes for this shader were
16x4x1 and 32x32x1.
Given a source image with the dimensions 640x480, we dispatch 40x120x1 groups
of 16x4x1 threads. This effectively creates a thread for each pixel within the 640x480
source image.
Compute shader functions support the following input semantics: SV_GroupID ,
SV_GroupThreadID , SV_DispatchThreadID , and SV_GroupIndex . These values are
used to identify the current thread and index into any resources required by the thread.
The following diagram shows how each of these input semantics are calculated based on an
image with the dimensions 640x480, using [numthreads(32,32,1)] for the number of
threads per group, and dispatching the thread groups with context.Dispatch(20,15,1) .
Threads and thread groups execute in an undefined order with respect to each other during
a single Dispatch call. There are thread synchronization commands for threads within the
same thread group. These allow threads within the same thread group to communicate with
each other using the thread group-shared memory.
 
Search WWH ::




Custom Search