Game Development Reference
In-Depth Information
ReviewingtheGPAtracein Figure17.7 , theframeshowstheanimationand
render tasksets executing in sequence. Wall time for the frame increased to 2.8 ms.
That is unexpected. The GPA trace shows that both animation and rendering take
about 1.8 ms which is an improvement. The submit task is forcing a serialization
point, however. The drivers used to gather these data do not support multithreaded
command lists. Internally, the D3D11 API creates tokens which are then played
back in the ExecuteCommandList function. The multithreaded emulation slightly
increases the frame cost, yet all is not lost. Even for drivers where multithreaded
submission is not enabled, we can use the drain-out time (defined below) with
pipelining.
17.3.7 Pipelining Systems across Frames and Latency
The tasksets are now free of synchronization points and their dependencies are
properly specified. There are surely many more algorithmic and implementation
optimizations possible for this example. From a tasking perspective, however, the
scheduling is as ecient as possible for this frame. The tasking system schedules the
various game systems' work as soon as the dependencies allow, and the system's
tasks run concurrently. To get to the next level of tasking utilization, multiple
frames need to be in flight at once.
Pipelining in a thread-per-system game. With the thread-per-system model, the
number of frames in flight to achieve the maximum possible throughput is equal
to the number of dependent systems in a frame 3 . Let us assume a game has three
systems A, B, C and that the frame is CPU bound. System B depends on the
output of A and system C depends on the output of B. If the game systems are run
on one thread, total frame time is the sum of the running time of A, B, and C and
the Latency =1:
.
Time(Frame) = Time(A) + Time(B) + Time(C)
If each system is on a thread, then to achieve maximum throughput, the latency
will be three frames and
ExecTime(Frame) = max(ExecTime(A)
,
ExecTime(B)
,
ExecTime(C))
.
Also, the memory footprint expands since the inputs and outputs of the systems
need to be queued. For simplicity, we can describe the memory usage as a function of
the latency because each pipelined frame needs independent memory to operate on.
3 Complex threading systems that more closely resemble tasking can result in lower frame
latency at the cost of code complexity. There are a continuum of solutions between the ridged
thread-per-system model and the tasking model.
Search WWH ::




Custom Search