Graphics Reference
In-Depth Information
in HLSL, these registers are not directly used and are not directly visible to the developer.
However, the shader programs are eventually compiled into an assembly form before they
can be used to create shader objects for use in the pipeline. In addition, it is also possible to
obtain an assembly listing of a compiled shader (using the f xc. exe tool), which does allow
for inspection of the low-level details of a program.
Because of this, it is worth understanding some of the register functionality. The basic
architecture of the common shader core has been covered in Chapter 3, so please review
this section for reference during this discussion. In this section, we are the most interested
in the temporary registers. These registers can be used to hold intermediate calculations
during execution of a shader program. They are only accessible to the thread that is cur-
rently executing, and are typically extremely fast registers. Up to 4096 temporary registers
(r# and x# combined) are specified in the common shader core, which seems like a fairly
large amount for the small scale shader programs. However, while the hardware implemen-
tation must be able to handle 4096 registers, it doesn't necessarily need to provide each
processing core its own set of registers. Many architectures share a pool of registers among
many processing cores, meaning that the overall available number of registers can in real-
ity be less than 4096. While this level of detail is hidden from the developer, the upper limit
of 4096 registers is a good indication of how much data can be stored in these registers
during the execution of a shader program.
The temporary registers used in this processor architecture can be considered the
first class of memory that the compute shader can use. Due to its access speed, this stor-
age memory mechanism is the first choice for data that must be kept available within the
shader program. Selection and allocation of these registers are performed automatically
by the compiler, so in general, once data has been loaded into the shader core, it will use
temporary registers as much as possible.
5.3.2 Device Memory
While the register-based memory is very fast, it is finite in size. In addition, the desired data
must be loaded into the shader core before the registers can be used, and after a thread has
completed its shader program, the contents of these registers are reset for the next shader
program. Of course, we need to have data that persists between executions of a shader
program, and we also need much larger possible storage areas. This need is filled by the
memory resources that we have already seen many times throughout the topic. In the com-
pute shader context, these resources are commonly referred to as device memory resources
to distinguish them from the other available types of memory.
Direct3D 11 provides a significant array of resource types that can be used for read-
only, write-only, or read/write access. The compute shader can use shader resource views
and unordered access views to access device memory resources. These two resource views
allow read and read/write access, respectively. In addition, the compute shader is also able
Search WWH ::




Custom Search