Multithreaded Rendering - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

Our view-level multithreading granularity provides good potential for reduced CPU

overhead. Some rendering passes use very similar rendering effects for most, if not all of

the scene objects. Consider a shadow map generation pass—most objects will use the exact

same pixel shader to output the appropriate depth value, and most will also use a similar

transformation shader setup (vertex and/or tessellation based shaders) with some varia-

tions for static versus dynamic geometry. These types of rendering passes are typically

presorted at the view level, so all of the operations that a deferred context executes to set

up a rendering pass (such as setting render targets or stencil setup) will be amortized over

many draw calls.

However, due to the generally larger command lists, this view-based processing

doesn't have much of a chance to reuse command lists from frame to frame. As discussed

above, it is possible to update the dynamic state of objects by modifying the contents of

the resources used. However, this does not allow the application to change which objects

are rendered and which are culled in a given frame. This should not be seen as a critical

problem, however, since there will not be very many view-sized command lists being gen-

erated for each frame.

Per-Object Command Lists

The next level of granularity we could use is to render the scene objects at the individual

object level. In this scheme, the worker threads would generate one command list for each

object that will be rendered. This introduces a much finer level of processing and con-

sequently increases the number of command lists that must be generated and executed.

Because of larger number of command lists, it is probably advantageous to use deferred

context state propagation between command list generations. This would alleviate the need

to make more frequent calls to higher-level rendering setup functions, such as setting ren-

der targets, because so many command lists are used. The additional command lists would

also require a higher number of FinishCommandList() and ExecuteCommandList() calls,

which may or may not impact performance. Since the command lists are generated in isola-

tion from the rest of the scene, it also reduces the amount of batching that can be performed.

However, this paradigm also has some interesting side effects that could prove to

be beneficial. Once a command list is generated for a particular object for a particular

rendering pass, there is likely no reason to release and recreate the command list for every

frame. Since any per-frame dynamic rendering data, such as view or skinning matrices, is

provided to the shader programs in constant buffers, the command list will not change from

frame to frame as long as the same constant buffers are updated and used for every frame.

With no overhead for generating the command list, any additional costs discussed above

could largely be overcome. In addition, the simplicity of such a scheme would be attractive

as well—each object would simply receive its own command list and use it as necessary.

Search WWH ::

Custom Search

Home