The Computation Pipeline - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

in HLSL, these registers are not directly used and are not directly visible to the developer.

However, the shader programs are eventually compiled into an assembly form before they

can be used to create shader objects for use in the pipeline. In addition, it is also possible to

obtain an assembly listing of a compiled shader (using the f xc. exe tool), which does allow

for inspection of the low-level details of a program.

Because of this, it is worth understanding some of the register functionality. The basic

architecture of the common shader core has been covered in Chapter 3, so please review

this section for reference during this discussion. In this section, we are the most interested

in the temporary registers. These registers can be used to hold intermediate calculations

during execution of a shader program. They are only accessible to the thread that is cur-

rently executing, and are typically extremely fast registers. Up to 4096 temporary registers

(r# and x# combined) are specified in the common shader core, which seems like a fairly

large amount for the small scale shader programs. However, while the hardware implemen-

tation must be able to handle 4096 registers, it doesn't necessarily need to provide each

processing core its own set of registers. Many architectures share a pool of registers among

many processing cores, meaning that the overall available number of registers can in real-

ity be less than 4096. While this level of detail is hidden from the developer, the upper limit

of 4096 registers is a good indication of how much data can be stored in these registers

during the execution of a shader program.

The temporary registers used in this processor architecture can be considered the

first class of memory that the compute shader can use. Due to its access speed, this stor-

age memory mechanism is the first choice for data that must be kept available within the

shader program. Selection and allocation of these registers are performed automatically

by the compiler, so in general, once data has been loaded into the shader core, it will use

temporary registers as much as possible.

5.3.2 Device Memory

While the register-based memory is very fast, it is finite in size. In addition, the desired data

must be loaded into the shader core before the registers can be used, and after a thread has

completed its shader program, the contents of these registers are reset for the next shader

program. Of course, we need to have data that persists between executions of a shader

program, and we also need much larger possible storage areas. This need is filled by the

memory resources that we have already seen many times throughout the topic. In the com-

pute shader context, these resources are commonly referred to as device memory resources

to distinguish them from the other available types of memory.

Direct3D 11 provides a significant array of resource types that can be used for read-

only, write-only, or read/write access. The compute shader can use shader resource views

and unordered access views to access device memory resources. These two resource views

allow read and read/write access, respectively. In addition, the compute shader is also able

Search WWH ::

Custom Search

Home