Graphics Reference
In-Depth Information
lists all values returned by the shader. It also indicates if the pixel shader runs at per-sample
frequency, which is due to taking SV_SampleIndex as an input.
After the diagnostic information, the. output also contains the fully compiled shader
program in assembly. While it's not typically useful to examine the generated assembly,
it can be helpful for verification purposes during performance analysis. In particular, it is
common to look for dynamic branching or looping constructs, since these can have a drastic
effect on performance. Dynamic branches can be spotted by looking for an if_"comp" in-
struction, where comp is a two-letter abbreviation of a comparison. For instance, lt will be
used for a less-than comparison, and ge for a greater-than-or-equal comparison. Dynamic
loops will begin with a rep instruction. The output also contains an instruction count at the
end, which is simply the number of assembly instructions in the program.
While it is possible to get a very rough estimate of the relative performance cost of
a shader through this number, in general, it is not a reliable figure. This is because shader
assembly is merely an intermediate format that is further compiled by the driver into a
microcode format that can be executed by the hardware. Thus, the final program could
have a very different number of instructions, and the number of cycles required to execute
those instructions could also vary, depending on the hardware. More importantly, even the
actual microcode instructions will not properly reflect the larger-scale performance charac-
teristics caused by memory accesses and multithreaded execution. To obtain more accurate
performance statistics regarding a shader program, specialized analysis and profiling tools
are available from the major graphics hardware vendors.
Search WWH ::




Custom Search